[lustre-discuss] slow mount of lustre CentOS6 clients to 2.9 servers

Oucharek, Doug S doug.s.oucharek at intel.com
Fri May 5 11:09:02 PDT 2017


Are the NIDs "192.168.xxx.yyy at o2ib” really configured that way or did you modify those logs when pasting them to email?

Doug

On May 5, 2017, at 11:02 AM, Grigory Shamov <Grigory.Shamov at umanitoba.ca<mailto:Grigory.Shamov at umanitoba.ca>> wrote:

Hi All,

We were installing a new Lustre storage.
To that end , we have built new clients with the following configuration:

CentOS 6.8, kernel 2.6.32-642.el6.x86_64
Mellanox OFED 3.4.1.0 (on QDR fabric)

and either lustre-2.8.0 or lustre-2.9.0 clients, which we rebuilt from sources. The new server is Lustre 2.9 on CentOS 7.3 .

Now, the clients we built have a problem in mounting the filesystem.  It takes long time, and/or fails initially, with messages as follows (for the 2.8 client):

mounting device 192.168.xxx.yyy at o2ib:/lustre at /lustrenew, flags=0x400 options=flock,device=192.168.xxx.yyy at o2ib:/lustre
mount.lustre: mount 192.168.xxx.yyy at o2ib:/lustre at /lustrenew failed: Input/output error retries left: 0
mount.lustre: mount 192.168.xxx.yyy at o2ib:/lustre at /lustrenew failed: Input/output error
Is the MGS running?

In dmesg:

LNet: HW CPU cores: 24, npartitions: 4
alg: No test for adler32 (adler32-zlib)
alg: No test for crc32 (crc32-table)
alg: No test for crc32 (crc32-pclmul)
Lustre: Lustre: Build Version: 2.8.0-RC5--PRISTINE-2.6.32-642.el6.x86_64
LNet: Added LNI 192.168.aaa.bbb at o2ib [8/256/0/180]
Lustre: 3476:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1493927511/real 0]  req at ffff88061a1aac80 x1566496533774340/t0(0) o250->MGC192.168.xxx.yyy at o2ib@192.168.xxx.yyy<mailto:o2ib at 192.168.xxx.yyy>@o2ib:26/25 lens 520/544 e 0 to 1 dl 1493927516 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
LustreError: 15c-8: MGC192.168.xxx.yyy at o2ib: The configuration from log 'lustre-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
Lustre: Unmounted lustre-client


Initial mount would thus  fail; then mount  happens but OST's would take lot of time to become active;

UUID                  1K-blocks        Used  Available Use% Mounted on
lustre-MDT0000_UUID  1156701708      751100  1077936556  0% /lustrenew[MDT:0]
OST0000            : inactive device
OST0001            : inactive device
OST0002            : inactive device
OST0003            : inactive device
OST0004            : inactive device
OST0005            : inactive device
OST0006            : inactive device
OST0007            : inactive device

filesystem summary:            0          0          0  0% /lustrenew

then, after some 10 minutes , the mount completes and performance-wise, Lustre seems to be normal.

Same dmesg output from 2.9 client

LNet: HW CPU cores: 24, npartitions: 2
alg: No test for adler32 (adler32-zlib)
alg: No test for crc32 (crc32-table)
alg: No test for crc32 (crc32-pclmul)
Lustre: Lustre: Build Version: 2.9.0
LNet: Added LNI 192.168.aaa.bbb at o2ib [8/256/0/180]
Lustre: 3468:0:(client.c:2111:ptlrpc_expire_one_request()) @@@ Request sent has timed out for sent delay: [sent 1493929145/real 0]  req at ffff880631d07c80 x1566498247147536/t0(0) o250->MGC192.168.xxx.yyy at o2ib@192.168.xxx.yyy<mailto:o2ib at 192.168.xxx.yyy>@o2ib:26/25 lens 520/544 e 0 to 1 dl 1493929150 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
LustreError: 15c-8: MGC192.168.xxx.yyy at o2ib: The configuration from log 'lustre-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
Lustre: Unmounted lustre-client
LustreError: 3413:0:(obd_mount.c:1449:lustre_fill_super()) Unable to mount  (-5)

 I am at loss as to what would cause such behavior? Could anyone advise where to look at for the causes of this problem? Thank you very much in advance!

--
Grigory Shamov
HPC SIte Lead,
University of Manitoba
_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20170505/99a3d10c/attachment-0001.htm>


More information about the lustre-discuss mailing list