[Lustre-discuss] l_getgroups message

Andreas Dilger adilger at sun.com
Mon Aug 18 16:59:09 PDT 2008


On Aug 18, 2008  15:46 +0200, Heiko Schroeter wrote:
> from time to time we see these messages on our MDS 1.6.5.1 during
> copying data onto lustre.
> 
> Is this just informational or an indicator of a broken setup ? Network load 
> problems ?
> 
> We checked the group rights and they look ok to us. The lustre MDS system 
> including clients runs with YP setup.  It seems to us that the message comes 
> from  "lustre-1.6.5.1/lustre/utils/l_getgroups.c". But we cannot nail down 
> the real reason.

> Aug 18 14:39:53 mds1 l_getgroups: LONG OP getgrent loop: 25 elapsed, 3 expected
> Aug 18 14:39:53 mds1 l_getgroups: LONG OP get_groups_local: 25 elapsed, 10 expected

The reason is that it seems YP is taking longer than Lustre has expected it
to.  You should be able to remove these messages by increasing the timeout in
/proc/fs/lustre/mds/{mds}/group_acquire_expire.  You might also reduce the
load on the YP server by increasing the group cache lifetime by increasing
/proc/fs/lustre/mds/{mds}/group_expire_interval (default 600 seconds).

The message itself isn't harmful, just a warning at this point that your name
services are taking longer than expected.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list