[Lustre-discuss] IOR Single File -- lock callback timer expired
Andreas Dilger
adilger at sun.com
Fri Dec 12 17:10:35 PST 2008
On Dec 10, 2008 13:21 -0500, Roger Spellman wrote:
> I have a customer running IOR on 128 clients, using IOR's POSIX mode to
> create a single file.
>
> The clients are running Lustre 1.6.6. The servers are running Lustre
> 1.6.5.
If the file is not striped over multiple OSTs it may be that the 1 (default)
OST that this file is striped over is being overloaded.
> mpiexec noticed that job rank 0 with PID 7520 on node whitney160
> exited on signal 42 (Real-time signal 8).
>
> Looking at the logs on the servers, I see a bunch of messages like the
> following:
>
> Dec 9 18:23:38 ts-sandia-02 kernel: LustreError:
> 0:0:(ldlm_lockd.c:234:waiting_locks_callback()) ### lock callback timer
> expired after 116s: evicting client at 192.168.121.32 at o2ib ns:
> filter-scratch-OST0000_UUID lock: ffff810014239600/0x6316855aa9d9f014
> lrc: 1/0,0 mode: PW/PW res: 5987/0 rrc: 373 type: EXT
> [1409286144->1442840575] (req 1409286144->1410334719) flags: 20
> remote: 0x77037709d529258a expref: 28 pi
>
>
> What might be causing this?
This indicates that the (from the OST's POV) the client hasn't cancelled
the lock, nor done any writes under this lock in the past 2 minutes.
It would be worthwhile for you to check the RPC IO stats to see how long
writes are taking on this OST:
llstat -i 1 /proc/fs/lustre/ost/OSS/ost_io/stats
> Can I fix this problem by extending timers, such as
> /proc/sys/lustre/timeout and /proc/sys/lustre/ldlm_timeout ?
Increasing /proc/sys/lustre/timeout would likely help.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
More information about the lustre-discuss
mailing list