[Lustre-discuss] l_getgroups message
Heiko Schroeter
schroete at iup.physik.uni-bremen.de
Tue Aug 19 00:44:15 PDT 2008
Am Dienstag, 19. August 2008 01:59:09 schrieb Andreas Dilger:
> On Aug 18, 2008 15:46 +0200, Heiko Schroeter wrote:
> > from time to time we see these messages on our MDS 1.6.5.1 during
> > copying data onto lustre.
> >
> > Is this just informational or an indicator of a broken setup ? Network
> > load problems ?
> >
> > We checked the group rights and they look ok to us. The lustre MDS system
> > including clients runs with YP setup. It seems to us that the message
> > comes from "lustre-1.6.5.1/lustre/utils/l_getgroups.c". But we cannot
> > nail down the real reason.
> >
> > Aug 18 14:39:53 mds1 l_getgroups: LONG OP getgrent loop: 25 elapsed, 3
> > expected Aug 18 14:39:53 mds1 l_getgroups: LONG OP get_groups_local: 25
> > elapsed, 10 expected
>
> The reason is that it seems YP is taking longer than Lustre has expected it
> to. You should be able to remove these messages by increasing the timeout
> in /proc/fs/lustre/mds/{mds}/group_acquire_expire. You might also reduce
> the load on the YP server by increasing the group cache lifetime by
> increasing /proc/fs/lustre/mds/{mds}/group_expire_interval (default 600
> seconds).
>
> The message itself isn't harmful, just a warning at this point that your
> name services are taking longer than expected.
Yep, that seems to be the case. We setup a new YPserver which solely serves
the LUSTRE system and the messages disappears.
Not only that. We had problems when copying data onto lustre and starting
a 'du' od 'ls -la' on the lustre file system the client hung.
https://bugzilla.lustre.org/show_bug.cgi?id=16384
That problem dissapeared as well with the new YPserver.
Thanks for your help !
Heiko
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
More information about the lustre-discuss
mailing list