[Lustre-discuss] Help finding error in bugzilla

Andreas Dilger adilger at sun.com
Wed Jul 9 04:17:28 PDT 2008


On Jul 08, 2008  22:04 -0400, Jeff Blasius wrote:
> Jul  8 14:24:54 oss10 kernel: LustreError:
> 4572:0:(ldlm_lockd.c:646:ldlm_server_completion_ast()) ### enqueue
> wait took 7744763506us from 1215533749 ns: filter-lustre0-OST0009_UUID
> lock: 00000101a2b59580/0x9275e8a2d17f9488 lrc: 2/0,0 mode: PW/PW res:
> 68100256/0 rrc: 74 type: EXT [0->33554431] (req 0->4095) flags: 20
> remote: 0xf87d4d490599950 expref: 117 pid: 4757
> Jul  8 14:24:54 oss10 kernel: LustreError:
> 4572:0:(ldlm_lockd.c:646:ldlm_server_completion_ast()) Skipped 64
> previous similar messages

It looks like you have many processes writing to the start of the
same file.  That causes unavoidable lock contention, and is most
likely a bug in your program (e.g. the binary is linked with gprof
and all of them are overwriting the same output file).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list