[Lustre-discuss] odd ost disconnects during production

Andreas Dilger adilger at sun.com
Fri Jan 8 11:38:22 PST 2010


On 2010-01-08, at 12:19, John White wrote:
> 	We're getting some weird LustreError entries on a few OSTs in our  
> cluster but no real disruption of service.  Any ideas what might  
> cause such things?
>
> n0003: LustreError: 137-5: UUID 'lrc-OST0000_UUID' is not available   
> for connect (no target)
> n0003: LustreError: Skipped 2 previous similar messages
> n0003: LustreError: 11954:0:(ldlm_lib.c: 
> 1863:target_send_reply_msg()) @@@ processing error (-19)   
> req at ffff8102db286000 x1230507/t0 o8-><?>@<?>:0/0 lens 304/0 e 0 to 0  
> dl 1261726242 ref 1 fl Interpret:/0/0 rc -19/0
> n0003: LustreError: 11954:0:(ldlm_lib.c: 
> 1863:target_send_reply_msg()) Skipped 3 previous similar messages
>
>
> There are no further messages concerning this OST and the FS is  
> still in production accessing the OST with ease.  Are these clients  
> having a problem or OSSs?


Do you have failover configured?  It seems possible that the client is  
trying the backup OSS, which indeed doesn't have that OST configured,  
then tries the primary OSS and is successful.

Unfortunately, the "o8-><?>@<?>" is supposed to say where the  
"o8" (OST_CONNECT) RPC is being sent, but I suspect the debug message  
is slightly incorrect (i.e. a minor code bug) because it has no  
connection from which to get this information.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list