[Lustre-discuss] Lustre MDS Errors 1-7 and operation 101

Thomas Roth t.roth at gsi.de
Thu Jan 15 03:54:40 PST 2009

Thank you for this clarification on the operation X message!
"Running" Lustre without MGS or even MDT is something I have tested
already - involuntarily ;-)
But I was confused because in this case,  there were new mounts coming
all the time, so the MGS was there and answering, and at the same time
Lustre talks about an unconnected MGS.


Cliff White wrote:
> Thomas Roth wrote:
>> Hi all,
>> on our production cluster we have for a surprisingly long time (> 1 day)
>> only the following two error messages (and no visible problems),
>> although the system is under heavy load right now:
>> Jan 14 10:44:33 server1 kernel: LustreError:
>> 5118:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error
>> (-107)  req at ffff8107fd6c4c50 x2077599/t0 o101-><?>@<?>:0/0 lens 232/0 e
>> 0 to 0 dl 1231927273 ref 1 fl Interpret:/0/0 rc -107/0
>> and:
>> Jan 14 10:46:42 server1 kernel: LustreError:
>> 6766:0:(mgs_handler.c:557:mgs_handle()) lustre_mgs: operation 101 on
>> unconnected MGS
>> error (-107) is /* Transport endpoint is not connected */  -   I have
>> seen this before on clients which had lost the connection to the
>> cluster. But this is on the MGS/MDS - one server with one partition for
>> the MGS and one for the MDT.
> Remember, this is a distributed client/server system. When any node
> needs to connect to a service, there will be a client process.
> So, an OSS (which needs to talk to the MDS) will have a metadata client
> (mdc) running on it.
>> The second error suggests of course that the MGS is actually not
>> connected - but how can a Lustre system run when its MGS isn't there?
>> Makes no sense, does it?
> Ah, that's the beauty of Lustre. The MGS is needed for two things:
> - New clients get the mount from the MGS
> - Configuration changes are propagated from the MGS.
> So, if you are not actively mounting clients, and not changing the
> configuration, in fact Lustre can run just fine without the MGS.
> Filesystem users will not even notice it's gone, unless they are
> attempting a mount.
> Likewise, the MDS is used for metadata transactions. If a client is not
> actively touching metadata, (for example a client already has an open
> file and is doing IO only) you can fail the MDS without the clients
> noticing.
> Those two errors are quite harmless in this case - 'operation x on
> unconnected MGS' means a client was evicted, the client is attempting to
> replay an RPC, however the server has destroyed the import (due to the
> eviction) and it has not been re-established.
> cliffw
>> O.k., the cluster is running Debian Etch 64bit, Kernel 2.6.22, Lustre
>>  The "operation 101" thing is supposed to have been solved in
>> the 1.6.4 -> 1.6.5 upgrade, according to the change logs. Either it
>> hasn't, or I have a real problem were this error message really applies.
>> It is also remarkable that it seems nobody seems to know about the
>> meaning of "operation X on unconnected MGS" - via Google one will find
>> many questions  but no answers - at least that's my impression (and I
>> didn't search Bugzilla).
>> Many thanks,
>> Thomas
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
D-64291 Darmstadt

Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528

Geschäftsführer: Professor Dr. Horst Stöcker

Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph,
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt

More information about the lustre-discuss mailing list