[Lustre-discuss] lvbo_init failed

Andreas Dilger adilger at sun.com
Thu Jul 17 16:00:52 PDT 2008


On Jul 17, 2008  08:05 -0400, Charles Taylor wrote:
> We are getting lots of these (always for the same resource) on one of  
> our OSSs.
>
> LustreError: 22308:0:(ldlm_resource.c:719:ldlm_resource_add()) lvbo_init 
> failed for resource 5820180: rc -2: 1 Time(s)
> LustreError: 22225:0:(ldlm_resource.c:719:ldlm_resource_add()) lvbo_init 
> failed for resource 5820180: rc -2: 1 Time(s)
> LustreError: 22277:0:(ldlm_resource.c:719:ldlm_resource_add()) lvbo_init 
> failed for resource 5820180: rc -2: 2 Time(s)
> LustreError: 22274:0:(ldlm_resource.c:719:ldlm_resource_add()) lvbo_init 
> failed for resource 5820180: rc -2: 3 Time(s)
> LustreError: 22204:0:(ldlm_resource.c:719:ldlm_resource_add()) lvbo_init 
> failed for resource 5820180: rc -2: 1 Time(s)
> LustreError: 22193:0:(ldlm_resource.c:719:ldlm_resource_add()) lvbo_init 
> failed for resource 5820180: rc -2: 2 Time(s)
> LustreError: 22253:0:(ldlm_resource.c:719:ldlm_resource_add()) lvbo_init 
> failed for resource 5820180: rc -2: 1 Time(s)
> LustreError: 22200:0:(ldlm_resource.c:719:ldlm_resource_add()) lvbo_init 
> failed for resource 5820180: rc -2: 2 Time(s)
> LustreError: 22264:0:(ldlm_resource.c:719:ldlm_resource_add()) lvbo_init 
> failed for resource 5820180: rc -2: 1 Time(s)
>
> We've tried to track down the "object" with "lfs find" but no joy so  
> far.    I'm not even sure that is the right approach.   We found a but  
> pertaining to this in the lustre bugzilla but it looks like it was  
> resolved so I'm not sure that's the issue either.   Any one else run  
> into this before?   Is there something we can do to stop it?

This is an indication that some object is missing on the OST that one
or more clients is trying to access.  You can look at the Lustre debug
logs with "rpctrace" enabled to extract the "Handling RPC" and "Handled RPC"
messages on the thread printing this message, e.g.:

00000100:00100000:0:1216234365.071325:1536:32091:0:(service.c:1064:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc ldlm_cn_00:01318c63-cfd4-9199-8142-4e41ea812bd3+7:32099:x9:12345-0 at lo:101
00000100:00100000:0:1216234366.071325:1536:32091:0:(ldlm_resource.c:719:ldlm_resource_add()) lvbo_init failed for resource 5820180: rc -2
00000100:00100000:0:1216234367.071325:1536:32091:0:(service.c:1064:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc ldlm_cn_00:01318c63-cfd4-9199-8142-4e41ea812bd3+7:32099:x9:12345-0 at lo:101


is PID 32091, from the client "0 at lo" (in this made up example a local client).
Then you can check on the client (also with "vfstrace" and "rpctrace"
debugging on) what it was trying to do on the thread that requested this
RPC (PID 32091, XID "9" in this example).

To quiet it, the easiest mechanism is probably to just delete this file.
If it is a small file (< 1MB) and the data is still valid you could copy
it to another file and rename it over the old one.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list