[Lustre-discuss] odd ost disconnects during production
John White
JWhite at lbl.gov
Fri Jan 8 14:44:42 PST 2010
Ah, yes, we do have failover configured. Thanks for the explanation.
On Jan 8, 2010, at 11:38 AM, Andreas Dilger wrote:
> On 2010-01-08, at 12:19, John White wrote:
>> We're getting some weird LustreError entries on a few OSTs in our cluster but no real disruption of service. Any ideas what might cause such things?
>>
>> n0003: LustreError: 137-5: UUID 'lrc-OST0000_UUID' is not available for connect (no target)
>> n0003: LustreError: Skipped 2 previous similar messages
>> n0003: LustreError: 11954:0:(ldlm_lib.c:1863:target_send_reply_msg()) @@@ processing error (-19) req at ffff8102db286000 x1230507/t0 o8-><?>@<?>:0/0 lens 304/0 e 0 to 0 dl 1261726242 ref 1 fl Interpret:/0/0 rc -19/0
>> n0003: LustreError: 11954:0:(ldlm_lib.c:1863:target_send_reply_msg()) Skipped 3 previous similar messages
>>
>>
>> There are no further messages concerning this OST and the FS is still in production accessing the OST with ease. Are these clients having a problem or OSSs?
>
>
> Do you have failover configured? It seems possible that the client is trying the backup OSS, which indeed doesn't have that OST configured, then tries the primary OSS and is successful.
>
> Unfortunately, the "o8-><?>@<?>" is supposed to say where the "o8" (OST_CONNECT) RPC is being sent, but I suspect the debug message is slightly incorrect (i.e. a minor code bug) because it has no connection from which to get this information.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
----------------
John White
High Performance Computing Services (HPCS)
(510) 486-7307
One Cyclotron Rd, MS: 50B-3209C
Lawrence Berkeley National Lab
Berkeley, CA 94720
More information about the lustre-discuss
mailing list