[Lustre-discuss] yet another lustre error

Brock Palen brockp at umich.edu
Fri Mar 7 15:45:31 PST 2008


On a file system thats been up for only 57 days,  I have:

505 lustre-log.   dumps.

THe problem at hand is a user has many jobs where his jobs are now  
hung trying to create a directory from his pbs script.  On the  
clients i see:

LustreError: 11-0: an error occurred while communicating with  
141.212.30.184 at tcp. The mds_connect operation failed with -16
LustreError: Skipped 2 previous similar messages

On every client his jobs are on.

In the most recent /tmp/lustre-log.  on the MDS/MGS I see this message:

@@@ processing error (-16)  req at 000001001af9a600 x12808293/t0 o38- 
 >32633f05-02c6-50a5-b496-047150f1fe81 at NET_0x200000aa4003e_UUID:-1  
lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0
ldlm_lib.c
target_handle_reconnect
nobackup-MDT0000: 34b4fbea-200b-1f7c-dac0-516b8ce786fc reconnecting
ldlm_lib.c
target_handle_connect
nobackup-MDT0000: refuse reconnection from 34b4fbea-200b-1f7c- 
dac0-516b8ce786fc at 10.164.0.111@tcp to 0x00000100069a7000; still busy  
with 2 active RPCs
ldlm_lib.c
target_send_reply_msg
@@@ processing error (-16)  req at 0000010019159a00 x11199816/t0 o38- 
 >34b4fbea-200b-1f7c-dac0-516b8ce786fc at NET_0x200000aa4006f_UUID:-1  
lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0


What I see messages about active rpc's in other logs.  What would  
this mean?  Is something suck someplace ?



Brock Palen
Center for Advanced Computing
brockp at umich.edu
(734)936-1985





More information about the lustre-discuss mailing list