[Lustre-discuss] yet another lustre error

Aaron Knister aaron at iges.org
Sun Mar 9 19:01:46 PDT 2008


Hi! I have a few questions for you-

1. How many nodes was his job running on?
2. What version of lustre and linux kernel are you running on your  
servers/clients?
3. What ethernet module are you using on the servers/clients?

I honestly am not sure what the RPC errors mean but I've had similar  
issues caused by ethernet-level errors.

-Aaron

On Mar 7, 2008, at 6:45 PM, Brock Palen wrote:

> On a file system thats been up for only 57 days,  I have:
>
> 505 lustre-log.   dumps.
>
> THe problem at hand is a user has many jobs where his jobs are now
> hung trying to create a directory from his pbs script.  On the
> clients i see:
>
> LustreError: 11-0: an error occurred while communicating with
> 141.212.30.184 at tcp. The mds_connect operation failed with -16
> LustreError: Skipped 2 previous similar messages
>
> On every client his jobs are on.
>
> In the most recent /tmp/lustre-log.  on the MDS/MGS I see this  
> message:
>
> @@@ processing error (-16)  req at 000001001af9a600 x12808293/t0 o38-
>> 32633f05-02c6-50a5-b496-047150f1fe81 at NET_0x200000aa4003e_UUID:-1
> lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0
> ldlm_lib.c
> target_handle_reconnect
> nobackup-MDT0000: 34b4fbea-200b-1f7c-dac0-516b8ce786fc reconnecting
> ldlm_lib.c
> target_handle_connect
> nobackup-MDT0000: refuse reconnection from 34b4fbea-200b-1f7c-
> dac0-516b8ce786fc at 10.164.0.111@tcp to 0x00000100069a7000; still busy
> with 2 active RPCs
> ldlm_lib.c
> target_send_reply_msg
> @@@ processing error (-16)  req at 0000010019159a00 x11199816/t0 o38-
>> 34b4fbea-200b-1f7c-dac0-516b8ce786fc at NET_0x200000aa4006f_UUID:-1
> lens 304/200 ref 0 fl Interpret:/0/0 rc -16/0
>
>
> What I see messages about active rpc's in other logs.  What would
> this mean?  Is something suck someplace ?
>
>
>
> Brock Palen
> Center for Advanced Computing
> brockp at umich.edu
> (734)936-1985
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Aaron Knister
Associate Systems Analyst
Center for Ocean-Land-Atmosphere Studies

(301) 595-7000
aaron at iges.org







More information about the lustre-discuss mailing list