[lustre-discuss] Unable to mount client with 56 MDSes and beyond

Matt Rásó-Barnett matt at rasobarnett.com
Thu Jul 4 05:27:07 PDT 2019

I just tried out this configuration and was able to reproduce what Scott 
saw on 2.12.2.

I couldn't see a Jira ticket for this though so I've opened one a new 
one: https://jira.whamcloud.com/browse/LU-12506

Matt Rásó-Barnett
University of Cambridge

On Wed, May 22, 2019 at 08:02:59AM +0000, Andreas Dilger wrote:
>Scott, if you haven't already done so, it is probably best to file a 
>ticket in Jira with the details.  Please include the client 
>syslog/dmesg as well as a Lustre debug log ("lctl dk /tmp/debug") so 
>that the problem can be isolated.
>During DNE development we tested with up to 128 MDTs in AWS, but 
>haven't tested that many MDTs in some time.
>Cheers, Andreas
>On May 8, 2019, at 12:28, White, Scott F <sfpwhite at lanl.gov> wrote:
>> We’ve been testing DNE Phase II and tried scaling the number of 
>> MDSes(one MDT each for all of our tests) very high, but when we did 
>> that, we couldn’t mount the filesystem on a client.  After trial and 
>> error, we discovered that we were unable to mount the filesystem when 
>> there were 56 MDSes. 55 MDSes mounted without issue, and it appears 
>> any number below that will mount. This failure at 56 MDSes was 
>> replicable across different nodes being used for the MDSes, all of 
>> which were tested with working configurations, so it doesn’t seem to 
>> be a bad server.
>> Here’s the error info we saw in dmesg on the client:
>> LustreError: 28880:0:(obd_config.c:559:class_setup()) setup 
>> lustre-MDT0037-mdc-ffff95923d31b000 failed (-16)
>> LustreError: 28880:0:(obd_config.c:1836:class_config_llog_handler()) 
>> MGCx.x.x.x at o2ib: cfg command failed: rc = -16
>> Lustre:    cmd=cf003 0:lustre-MDT0037-mdc  1:lustre-MDT0037_UUID  
>> 2:x.x.x.x at o2ib
>> LustreError: 15c-8: MGCx.x.x.x at o2ib: The configuration from log 
>> 'lustre-client' failed (-16). This may be the result of communication 
>> errors between this node and the MGS, a bad configuration, or other 
>> errors. See the syslog for more information.
>> LustreError: 28858:0:(obd_config.c:610:class_cleanup()) Device 58 not 
>> setup
>> Lustre: Unmounted lustre-client
>> LustreError: 28858:0:(obd_mount.c:1608:lustre_fill_super()) Unable to 
>> mount  (-16)
>> OS: CentOS 7.6.1810
>> Kernel: 3.10.0-957.5.1.el7.x86_64
>> Lustre: 2.12.1
>> Network card: Qlogic InfiniPath_QLE7340
>> Other things to note for completeness’ sake: this happened with both 
>> ldiskfs and zfs backfstypes, and these tests were using files in 
>> memory as the backing devices.
>> Is there something I’m missing as to why more than 56 MDSes won’t 
>> mount?
>> Thanks,
>> Scott White
>> Scientist, HPC
>> Los Alamos National Laboratory
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>Cheers, Andreas
>Andreas Dilger
>Principal Lustre Architect
>lustre-discuss mailing list
>lustre-discuss at lists.lustre.org

More information about the lustre-discuss mailing list