[Lustre-discuss] l_getgroups message

Heiko Schroeter schroete at iup.physik.uni-bremen.de
Tue Aug 19 00:44:15 PDT 2008


Am Dienstag, 19. August 2008 01:59:09 schrieb Andreas Dilger:
> On Aug 18, 2008  15:46 +0200, Heiko Schroeter wrote:
> > from time to time we see these messages on our MDS 1.6.5.1 during
> > copying data onto lustre.
> >
> > Is this just informational or an indicator of a broken setup ? Network
> > load problems ?
> >
> > We checked the group rights and they look ok to us. The lustre MDS system
> > including clients runs with YP setup.  It seems to us that the message
> > comes from  "lustre-1.6.5.1/lustre/utils/l_getgroups.c". But we cannot
> > nail down the real reason.
> >
> > Aug 18 14:39:53 mds1 l_getgroups: LONG OP getgrent loop: 25 elapsed, 3
> > expected Aug 18 14:39:53 mds1 l_getgroups: LONG OP get_groups_local: 25
> > elapsed, 10 expected
>
> The reason is that it seems YP is taking longer than Lustre has expected it
> to.  You should be able to remove these messages by increasing the timeout
> in /proc/fs/lustre/mds/{mds}/group_acquire_expire.  You might also reduce
> the load on the YP server by increasing the group cache lifetime by
> increasing /proc/fs/lustre/mds/{mds}/group_expire_interval (default 600
> seconds).
>
> The message itself isn't harmful, just a warning at this point that your
> name services are taking longer than expected.

Yep, that seems to be the case. We setup a new YPserver which solely serves 
the LUSTRE system and the messages disappears.

Not only that. We had problems when copying data onto lustre and starting 
a 'du'  od 'ls -la' on the lustre file system the client hung.
https://bugzilla.lustre.org/show_bug.cgi?id=16384
That problem dissapeared as well with the new YPserver.

Thanks for your help !
Heiko

>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.





More information about the lustre-discuss mailing list