[Lustre-discuss] Watchdog triggered for pid 5383: it was inactive for 100s (+ stack trace)
Andreas Dilger
adilger at sun.com
Mon Dec 22 00:15:34 PST 2008
On Dec 18, 2008 13:47 -0600, Hendelman, Rob wrote:
> Is this something to be concerned about? We have quite a few of these. This is on our mgs/mds box.
>
> Our mgs/mds aren't in one filesystem (separate spindles with separate spindles for journals as well), but are on the same box.
>
> Lustre: 0:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for
> process 5383
You missed some important messages above this that explain why this was hit.
> Call Trace:
> [<ffffffff80097e35>] call_usermodehelper_keys+0xea/0xff
> [<ffffffff80097e4a>] __call_usermodehelper+0x0/0x4f
> [<ffffffff884065af>] :lvfs:upcall_cache_get_entry+0x5bf/0xa50
This implies that you are using something like LDAP for users/groups
on the MDS, and it can't reply in a timely manner (e.g. within several
seconds). You can tune this to put less load on your LDAP server by
increasing /proc/fs/lustre/mds/myth-MDT0000/group_expire_interval
(number of seconds to refresh a user->group mapping, default 600s).
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
More information about the lustre-discuss
mailing list