[Lustre-discuss] Lustre MDS Errors 1-7 and operation 101

Thomas Roth t.roth at gsi.de
Wed Jan 14 02:34:20 PST 2009


Hi all,

on our production cluster we have for a surprisingly long time (> 1 day)
only the following two error messages (and no visible problems),
although the system is under heavy load right now:

Jan 14 10:44:33 server1 kernel: LustreError:
5118:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error
(-107)  req at ffff8107fd6c4c50 x2077599/t0 o101-><?>@<?>:0/0 lens 232/0 e
0 to 0 dl 1231927273 ref 1 fl Interpret:/0/0 rc -107/0

and:

Jan 14 10:46:42 server1 kernel: LustreError:
6766:0:(mgs_handler.c:557:mgs_handle()) lustre_mgs: operation 101 on
unconnected MGS


error (-107) is /* Transport endpoint is not connected */  -   I have
seen this before on clients which had lost the connection to the
cluster. But this is on the MGS/MDS - one server with one partition for
the MGS and one for the MDT.
The second error suggests of course that the MGS is actually not
connected - but how can a Lustre system run when its MGS isn't there?
Makes no sense, does it?

O.k., the cluster is running Debian Etch 64bit, Kernel 2.6.22, Lustre
1.6.5.1.  The "operation 101" thing is supposed to have been solved in
the 1.6.4 -> 1.6.5 upgrade, according to the change logs. Either it
hasn't, or I have a real problem were this error message really applies.

It is also remarkable that it seems nobody seems to know about the
meaning of "operation X on unconnected MGS" - via Google one will find
many questions  but no answers - at least that's my impression (and I
didn't search Bugzilla).

Many thanks,
Thomas







More information about the lustre-discuss mailing list