[Lustre-discuss] Lustre 1.6.4.2 Error

Dilling dilling at zdv.uni-tuebingen.de
Fri Mar 21 11:15:18 PDT 2008


Hi,
some days ago one of my users started a lot of matlab jobs flooding  
all processors on our 40 nodes 2CPU cluster (4Core System). More  
details and log files can be found in the appendix. As a result of  
this I observed a strange behavior of lustre. Ptlrpcd used 100% of one  
CPU, the second CPU was completly occupied by pwd. Pwd was a child of  
the matlab process invoked by the user. I/O on lustre was partly  
possible but df reported access denied. A recovery with the mdt  
started after lustre.timeout=300 but did not complete. I had to reboot  
all nodes which showed this behavior. The ost showed the message:
  Mar 14 16:47:05 cn46 kernel: LustreError: 138-a: lustre-OST0003: A  
client on nid 10.128.15.2 at tcp was evicted due to a lock glimpse  
callback to 10.128.15.2 at tcp timed out: rc -110
The client kernels reported soft lockup on all available cores.
Does anyone have an idea how to prevent such behavior. Thanks for your help.

Regards
w.d.

--------------------------------------------------------------------------------
W.Dilling                               Tel.: (49) 7071/29-70206
Universitaet Tuebingen                  Fax.: (49) 7071/29-5912
Zentrum fuer Datenverarbeitung          mail: dilling at zdv.uni-tuebingen.de
Waechterstrasse 76
72074 Tuebingen

-------------- next part --------------
A non-text attachment was scrubbed...
Name: lustre_error_14.03.2008.tar
Type: application/x-tar
Size: 71680 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080321/ff21f16f/attachment.tar>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2052 bytes
Desc: S/MIME krytographische Unterschrift
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080321/ff21f16f/attachment.bin>


More information about the lustre-discuss mailing list