[Lustre-discuss] Lustre MDS Errors 1-7 and operation 101
t.roth at gsi.de
Wed Jan 14 02:34:20 PST 2009
on our production cluster we have for a surprisingly long time (> 1 day)
only the following two error messages (and no visible problems),
although the system is under heavy load right now:
Jan 14 10:44:33 server1 kernel: LustreError:
5118:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error
(-107) req at ffff8107fd6c4c50 x2077599/t0 o101-><?>@<?>:0/0 lens 232/0 e
0 to 0 dl 1231927273 ref 1 fl Interpret:/0/0 rc -107/0
Jan 14 10:46:42 server1 kernel: LustreError:
6766:0:(mgs_handler.c:557:mgs_handle()) lustre_mgs: operation 101 on
error (-107) is /* Transport endpoint is not connected */ - I have
seen this before on clients which had lost the connection to the
cluster. But this is on the MGS/MDS - one server with one partition for
the MGS and one for the MDT.
The second error suggests of course that the MGS is actually not
connected - but how can a Lustre system run when its MGS isn't there?
Makes no sense, does it?
O.k., the cluster is running Debian Etch 64bit, Kernel 2.6.22, Lustre
126.96.36.199. The "operation 101" thing is supposed to have been solved in
the 1.6.4 -> 1.6.5 upgrade, according to the change logs. Either it
hasn't, or I have a real problem were this error message really applies.
It is also remarkable that it seems nobody seems to know about the
meaning of "operation X on unconnected MGS" - via Google one will find
many questions but no answers - at least that's my impression (and I
didn't search Bugzilla).
More information about the lustre-discuss