[lustre-discuss] Unable to mount client with 56 MDSes and beyond
Matt Rásó-Barnett
matt at rasobarnett.com
Thu Jul 4 05:27:07 PDT 2019
I just tried out this configuration and was able to reproduce what Scott
saw on 2.12.2.
I couldn't see a Jira ticket for this though so I've opened one a new
one: https://jira.whamcloud.com/browse/LU-12506
Cheers,
--
Matt Rásó-Barnett
University of Cambridge
On Wed, May 22, 2019 at 08:02:59AM +0000, Andreas Dilger wrote:
>Scott, if you haven't already done so, it is probably best to file a
>ticket in Jira with the details. Please include the client
>syslog/dmesg as well as a Lustre debug log ("lctl dk /tmp/debug") so
>that the problem can be isolated.
>
>During DNE development we tested with up to 128 MDTs in AWS, but
>haven't tested that many MDTs in some time.
>
>Cheers, Andreas
>
>On May 8, 2019, at 12:28, White, Scott F <sfpwhite at lanl.gov> wrote:
>>
>> We’ve been testing DNE Phase II and tried scaling the number of
>> MDSes(one MDT each for all of our tests) very high, but when we did
>> that, we couldn’t mount the filesystem on a client. After trial and
>> error, we discovered that we were unable to mount the filesystem when
>> there were 56 MDSes. 55 MDSes mounted without issue, and it appears
>> any number below that will mount. This failure at 56 MDSes was
>> replicable across different nodes being used for the MDSes, all of
>> which were tested with working configurations, so it doesn’t seem to
>> be a bad server.
>>
>> Here’s the error info we saw in dmesg on the client:
>>
>> LustreError: 28880:0:(obd_config.c:559:class_setup()) setup
>> lustre-MDT0037-mdc-ffff95923d31b000 failed (-16)
>> LustreError: 28880:0:(obd_config.c:1836:class_config_llog_handler())
>> MGCx.x.x.x at o2ib: cfg command failed: rc = -16
>> Lustre: cmd=cf003 0:lustre-MDT0037-mdc 1:lustre-MDT0037_UUID
>> 2:x.x.x.x at o2ib
>> LustreError: 15c-8: MGCx.x.x.x at o2ib: The configuration from log
>> 'lustre-client' failed (-16). This may be the result of communication
>> errors between this node and the MGS, a bad configuration, or other
>> errors. See the syslog for more information.
>> LustreError: 28858:0:(obd_config.c:610:class_cleanup()) Device 58 not
>> setup
>> Lustre: Unmounted lustre-client
>> LustreError: 28858:0:(obd_mount.c:1608:lustre_fill_super()) Unable to
>> mount (-16)
>>
>> OS: CentOS 7.6.1810
>> Kernel: 3.10.0-957.5.1.el7.x86_64
>> Lustre: 2.12.1
>> Network card: Qlogic InfiniPath_QLE7340
>>
>> Other things to note for completeness’ sake: this happened with both
>> ldiskfs and zfs backfstypes, and these tests were using files in
>> memory as the backing devices.
>>
>> Is there something I’m missing as to why more than 56 MDSes won’t
>> mount?
>>
>> Thanks,
>> Scott White
>> Scientist, HPC
>> Los Alamos National Laboratory
>>
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>Cheers, Andreas
>--
>Andreas Dilger
>Principal Lustre Architect
>Whamcloud
>
>_______________________________________________
>lustre-discuss mailing list
>lustre-discuss at lists.lustre.org
>http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
More information about the lustre-discuss
mailing list