[Lustre-discuss] Seeing OST errors on the OSS that doesnt have it mounted

Andreas Dilger adilger at sun.com
Wed Aug 27 00:51:04 PDT 2008


On Aug 26, 2008  22:56 -0700, Alex Lee wrote:
> I'm getting these error messages on my OSS. However the OST006 is not mounted on this OSS(lustre-oss-0-0).  OST0006 is visible to the server because lustre-oss-0-0 is the failover node for lustre-oss-0-1 which does mount OST0006.
> 
> Should I be worried about these errors? I dont understand why the OSS is even giving these errors out since there is no hardware issue that I can see. Also the OST is not mounted on that OSS I would think its "invisible" to the OSS.
> 
> I only get these errors during lustre usage. When the filesystem is not used I never get any errors.

When a client has a problem accessing a service (OST or MDT) on the primary
node (e.g. RPC timeout) it will retry on the same node first, then try the
backup and continue to try both until one of them answers...

> Aug 23 12:27:52 lustre-oss-0-0 kernel: LustreError: 2918:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@
> @ processing error (-19)  req at ffff81026f189a00 x52/t0 o8-><?>@<?>:0/0 lens 240/0 e 0 to 0 dl 1219462372
>  ref 1 fl Interpret:/0/0 rc -19/0

The fact that lustre-oss-0-0 returns -ENODEV (-19) isn't a reason to stop
trying there, because it may take some time for OST to failover from primary
server to backup.

What this really means is that your primary server is having network
trouble, or is so severely overloaded that the client has given up on
it and is trying the backup.  It could also be a problem on the client
I guess.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list