[Lustre-discuss] OST disconnect messages on OSS

Reto Gantenbein reto.gantenbein at id.unibe.ch
Fri Sep 19 07:07:27 PDT 2008


Hi everybody

We just hit the same problem last evening. The OSTs were suddenly  
disconnecting from the OSS.

I saw that we have manually limited the number of OSS threads to 128  
while we are exporting 4 OSTs on that server and the file system is  
mounted by about 100 clients. I think this may be an issue? Could you  
find you're reason for the errors?

I will now remove this thread limitation and see if this helps.

Kind regards
Reto Gantenbein


On Aug 13, 2008, at 3:39 PM, Alex Lee wrote:

> I have a system thats been spitting out OST disconnect messages under
> heavy load. I'm guessing the OST eventually reconnects.
> I want to say this happens when the OSS is extremely overloaded but I
> did notice this happening even under light load. Only the OSS seems to
> spit out any error messages. I dont see anything on the client side.
>
> Should I be concern? Or does this typically happen on other sites too?
>
> -Alex
>
> clip off one of the OSS:
>
> Aug 13 17:26:48 lustre-oss-0-1 kernel: LustreError: 137-5: UUID
> 'lfs-OST0004_UUID' is not available  for connect (no target)
> Aug 13 17:26:48 lustre-oss-0-1 kernel: LustreError:
> 11094:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error
> (-19)  req at f
> fff8101f4570600 x54/t0 o8-><?>@<?>:0/0 lens 240/0 e 0 to 0 dl  
> 1218616308
> ref 1 fl Interpret:/0/0 rc -19/0
> Aug 13 17:26:48 lustre-oss-0-1 kernel: LustreError:
> 11094:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 3 previous
> similar messag
> es
> Aug 13 17:26:48 lustre-oss-0-1 kernel: LustreError: Skipped 3 previous
> similar messages
> Aug 13 17:48:56 lustre-oss-0-1 kernel: LustreError: 137-5: UUID
> 'lfs-OST0004_UUID' is not available  for connect (no target)
> Aug 13 17:48:56 lustre-oss-0-1 kernel: LustreError:
> 10984:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error
> (-19)  req at f
> fff81010fc86600 x50/t0 o8-><?>@<?>:0/0 lens 240/0 e 0 to 0 dl  
> 1218617636
> ref 1 fl Interpret:/0/0 rc -19/0
> Aug 13 17:48:56 lustre-oss-0-1 kernel: LustreError:
> 10984:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 1 previous
> similar messag
> e
> Aug 13 17:48:56 lustre-oss-0-1 kernel: LustreError: Skipped 1 previous
> similar message
> Aug 13 18:47:39 lustre-oss-0-1 kernel: LustreError: 137-5: UUID
> 'lfs-OST0005_UUID' is not available  for connect (no target)
> Aug 13 18:47:39 lustre-oss-0-1 kernel: LustreError:
> 11070:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error
> (-19)  req at f
> fff81022861b400 x49/t0 o8-><?>@<?>:0/0 lens 240/0 e 0 to 0 dl  
> 1218621159
> ref 1 fl Interpret:/0/0 rc -19/0
> Aug 13 18:47:39 lustre-oss-0-1 kernel: LustreError:
> 11070:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 1 previous
> similar messag
> e
>
> Different OSS:
> Aug 12 20:13:49 lustre-oss-6-0 kernel: LustreError: 137-5: UUID
> 'lfs-OST0050_UUID' is not available  for connect (no target)
> Aug 12 20:13:49 lustre-oss-6-0 kernel: LustreError:
> 13527:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error
> (-19)  req at f
> fff8103d3b79a00 x124/t0 o8-><?>@<?>:0/0 lens 240/0 e 0 to 0 dl
> 1218539929 ref 1 fl Interpret:/0/0 rc -19/0
> Aug 12 20:13:49 lustre-oss-6-0 kernel: LustreError:
> 13527:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 1 previous
> similar messag
> e
> Aug 12 20:13:49 lustre-oss-6-0 kernel: LustreError: Skipped 1 previous
> similar message
> Aug 12 20:13:55 lustre-oss-6-0 kernel: LustreError: 137-5: UUID
> 'lfs-OST004f_UUID' is not available  for connect (no target)
> Aug 12 20:13:55 lustre-oss-6-0 kernel: LustreError:
> 13521:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error
> (-19)  req at f
> fff8103d3e92a00 x125/t0 o8-><?>@<?>:0/0 lens 240/0 e 0 to 0 dl
> 1218539935 ref 1 fl Interpret:/0/0 rc -19/0
> Aug 12 20:13:55 lustre-oss-6-0 kernel: LustreError:
> 13521:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 1 previous
> similar messag
> e
> Aug 12 20:13:55 lustre-oss-6-0 kernel: LustreError: Skipped 1 previous
> similar message
> Aug 12 20:13:58 lustre-oss-6-0 kernel: LustreError: 137-5: UUID
> 'lfs-OST004f_UUID' is not available  for connect (no target)
> Aug 12 20:13:58 lustre-oss-6-0 kernel: LustreError:
> 28121:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error
> (-19)  req at f
> fff8103d3983c00 x125/t0 o8-><?>@<?>:0/0 lens 240/0 e 0 to 0 dl
> 1218539938 ref 1 fl Interpret:/0/0 rc -19/0
> Aug 12 20:13:58 lustre-oss-6-0 kernel: LustreError:
> 28121:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 5 previous
> similar messag
> es
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

-- 
Universität Bern
Abt. Informatikdienste
Gruppe Zentrale Systeme

Reto Gantenbein
Administrator UBELIX

Gesellschaftsstrasse 6
CH-3012 Bern
Raum -104
Tel.  +41 (0)31 631 87 97







More information about the lustre-discuss mailing list