[lustre-discuss] Interoperability 2.12.7 client <-> 2.12.8 server

Thomas Roth t.roth at gsi.de
Wed Mar 2 23:40:44 PST 2022


Dear all,

this might be just something I forgot or did not read thoroughly, but shouldn't a 2.12.7-client work with 2.12.8 - servers?

The 2.12.8-changelog has the standard disclaimer
> Interoperability Support:
>    Clients & Servers: Latest 2.10.X and Latest 2.11.X



I have this test cluster that I upgraded recently to 2.12.8 on the servers.

The fist client I attached now is a fresh install of rhel 8.5 (Alma).
I installed 'kmod-lustre-client' and `lustre-client` from https://downloads.whamcloud.com/public/lustre/lustre-2.12.8/el8.5.2111/
I copied a directory containing ~5000 files - no visible issues


The next client was also installed with rhel 8.5 (Alma), but now using 'lustre-client-2.12.7-1' and 'lustre-client-dkms-2.12.7-1' from
https://downloads.whamcloud.com/public/lustre/lustre-2.12.7/el8/client/RPMS/x86_64/

As on my first client, I copied a directory containing ~5000 files. The copy stalled, and the OSTs exploded in my face

> kernel: LustreError: 23345:0:(events.c:310:request_in_callback()) event type 2, status -103, 
service ost_io
> kernel: LustreError: 40265:0:(pack_generic.c:605:__lustre_unpack_msg()) message length 0 too small 
for magic/version check
> kernel: LustreError: 40265:0:(sec.c:2217:sptlrpc_svc_unwrap_request()) error unpacking request from 
12345-10.20.2.167 at o2ib6 x1726208297906176
> kernel: LustreError: 23345:0:(events.c:310:request_in_callback()) event type 2, status -103, 
service ost_io


The latter message is repeated ad infinitum.

The client log blames the network:
> Request sent has failed due to network error
>  Connection to was lost; in progress operations using this service will wait for recovery to complete

> LustreError: 181316:0:(events.c:205:client_bulk_callback()) event type 1, status -103, desc0000000086e248d6
> LustreError: 181315:0:(events.c:205:client_bulk_callback()) event type 1, status -5, desc 
00000000e569130f



There is also a client running Debian 9 and Lustre 2.12.6 (compiled from git) - no trouble at all.


The I switched those two rhel8.5-clients: reinstalled the OS, gave the first one the 2.12.7 -packages, the second on the 2.12.8 - and the error 
followed: again the client running with 'lustre-client-dkms-2.12.7-1' immedeately ran into trouble, causing the same error messages in the logs.
So this is not a network problem in the sense of broken hardware etc.


What did I miss?
Some important Jira I did not read?


Regards
Thomas


-- 
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz



More information about the lustre-discuss mailing list