[lustre-discuss] Unable to mount client with 56 MDSes and beyond

Colin Faber cfaber at gmail.com
Thu Jul 4 09:51:53 PDT 2019


We encountered this in testing done time ago and already have a bug filed
(don't recall the number right now) and should have a patch soonish if not
already. The gist of the problem is changelog registration limits (interger
type) and some padding resulting in an artificially low limit.

On Thu, Jul 4, 2019, 6:42 AM Matt Rásó-Barnett <matt at rasobarnett.com> wrote:

> I just tried out this configuration and was able to reproduce what Scott
> saw on 2.12.2.
>
> I couldn't see a Jira ticket for this though so I've opened one a new
> one: https://jira.whamcloud.com/browse/LU-12506
>
> Cheers,
> --
> Matt Rásó-Barnett
> University of Cambridge
>
> On Wed, May 22, 2019 at 08:02:59AM +0000, Andreas Dilger wrote:
> >Scott, if you haven't already done so, it is probably best to file a
> >ticket in Jira with the details.  Please include the client
> >syslog/dmesg as well as a Lustre debug log ("lctl dk /tmp/debug") so
> >that the problem can be isolated.
> >
> >During DNE development we tested with up to 128 MDTs in AWS, but
> >haven't tested that many MDTs in some time.
> >
> >Cheers, Andreas
> >
> >On May 8, 2019, at 12:28, White, Scott F <sfpwhite at lanl.gov> wrote:
> >>
> >> We’ve been testing DNE Phase II and tried scaling the number of
> >> MDSes(one MDT each for all of our tests) very high, but when we did
> >> that, we couldn’t mount the filesystem on a client.  After trial and
> >> error, we discovered that we were unable to mount the filesystem when
> >> there were 56 MDSes. 55 MDSes mounted without issue, and it appears
> >> any number below that will mount. This failure at 56 MDSes was
> >> replicable across different nodes being used for the MDSes, all of
> >> which were tested with working configurations, so it doesn’t seem to
> >> be a bad server.
> >>
> >> Here’s the error info we saw in dmesg on the client:
> >>
> >> LustreError: 28880:0:(obd_config.c:559:class_setup()) setup
> >> lustre-MDT0037-mdc-ffff95923d31b000 failed (-16)
> >> LustreError: 28880:0:(obd_config.c:1836:class_config_llog_handler())
> >> MGCx.x.x.x at o2ib: cfg command failed: rc = -16
> >> Lustre:    cmd=cf003 0:lustre-MDT0037-mdc  1:lustre-MDT0037_UUID
> >> 2:x.x.x.x at o2ib
> >> LustreError: 15c-8: MGCx.x.x.x at o2ib: The configuration from log
> >> 'lustre-client' failed (-16). This may be the result of communication
> >> errors between this node and the MGS, a bad configuration, or other
> >> errors. See the syslog for more information.
> >> LustreError: 28858:0:(obd_config.c:610:class_cleanup()) Device 58 not
> >> setup
> >> Lustre: Unmounted lustre-client
> >> LustreError: 28858:0:(obd_mount.c:1608:lustre_fill_super()) Unable to
> >> mount  (-16)
> >>
> >> OS: CentOS 7.6.1810
> >> Kernel: 3.10.0-957.5.1.el7.x86_64
> >> Lustre: 2.12.1
> >> Network card: Qlogic InfiniPath_QLE7340
> >>
> >> Other things to note for completeness’ sake: this happened with both
> >> ldiskfs and zfs backfstypes, and these tests were using files in
> >> memory as the backing devices.
> >>
> >> Is there something I’m missing as to why more than 56 MDSes won’t
> >> mount?
> >>
> >> Thanks,
> >> Scott White
> >> Scientist, HPC
> >> Los Alamos National Laboratory
> >>
> >> _______________________________________________
> >> lustre-discuss mailing list
> >> lustre-discuss at lists.lustre.org
> >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> >
> >Cheers, Andreas
> >--
> >Andreas Dilger
> >Principal Lustre Architect
> >Whamcloud
> >
> >_______________________________________________
> >lustre-discuss mailing list
> >lustre-discuss at lists.lustre.org
> >http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190704/ae3af05c/attachment.html>


More information about the lustre-discuss mailing list