[Lustre-discuss] odd ost disconnects during production

John White JWhite at lbl.gov
Fri Jan 8 14:44:42 PST 2010


Ah, yes, we do have failover configured.  Thanks for the explanation.

On Jan 8, 2010, at 11:38 AM, Andreas Dilger wrote:

> On 2010-01-08, at 12:19, John White wrote:
>> 	We're getting some weird LustreError entries on a few OSTs in our cluster but no real disruption of service.  Any ideas what might cause such things?
>> 
>> n0003: LustreError: 137-5: UUID 'lrc-OST0000_UUID' is not available  for connect (no target)
>> n0003: LustreError: Skipped 2 previous similar messages
>> n0003: LustreError: 11954:0:(ldlm_lib.c:1863:target_send_reply_msg()) @@@ processing error (-19)  req at ffff8102db286000 x1230507/t0 o8-><?>@<?>:0/0 lens 304/0 e 0 to 0 dl 1261726242 ref 1 fl Interpret:/0/0 rc -19/0
>> n0003: LustreError: 11954:0:(ldlm_lib.c:1863:target_send_reply_msg()) Skipped 3 previous similar messages
>> 
>> 
>> There are no further messages concerning this OST and the FS is still in production accessing the OST with ease.  Are these clients having a problem or OSSs?
> 
> 
> Do you have failover configured?  It seems possible that the client is trying the backup OSS, which indeed doesn't have that OST configured, then tries the primary OSS and is successful.
> 
> Unfortunately, the "o8-><?>@<?>" is supposed to say where the "o8" (OST_CONNECT) RPC is being sent, but I suspect the debug message is slightly incorrect (i.e. a minor code bug) because it has no connection from which to get this information.
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
> 

----------------
John White
High Performance Computing Services (HPCS)
(510) 486-7307
One Cyclotron Rd, MS: 50B-3209C
Lawrence Berkeley National Lab
Berkeley, CA 94720











More information about the lustre-discuss mailing list