[lustre-discuss] Question About Mellanox-RDMA On Lustre
王烁斌
w14767780617 at 163.com
Tue Jun 6 01:50:52 PDT 2023
Hi~
I want to establish a dual node Lustre server environment. Use RDMA among them to improve the performance of server response.
After installing Lustre and corresponding drivers that support RDMA, there was an issue during the deployment of the Lustre file system.
When mounting MDS on the second node, the following error occurred:
[root at 172-0-37-83 ~]# mount -t lustre /dev/disk/by-id/scsi-3600b3420371420b645dde619060000aa /mnt/tfs/mgs2
mount.lustre: mount /dev/mapper/mpathcj at /mnt/tfs/mgs2 failed: Connection timed out
Log information:
Jun 6 04:44:25 localhost kernel: LNetError: 23212:0:(o2iblnd.c:819:kiblnd_create_conn()) cmid HCA(mlx5_0), kib_dev(ens14f0np0) need failover
Jun 6 04:44:31 localhost kernel: LNetError: 23213:0:(o2iblnd.c:819:kiblnd_create_conn()) cmid HCA(mlx5_0), kib_dev(ens14f0np0) need failover
I found a similar issue in the community, but it still failed after trying to reload the module。
[LU-7124] MLX5: Limit hit in cap.max_send_wr - Whamcloud Community JIRA
May I ask what is causing this and what changes are needed to solve the problem?
——Shuobin
The following is my configuration and formatting process:
node1
node2
mkfs.lustre --fsname=ltfs1 --mgs --mdt --index=0 --servicenode=192.168.19.14 at o2ib1 --servicenode=192.168.19.15 at o2ib1 --reformat --mkfsoptions "-E stride=32" /dev/disk/by-id/scsi-3600b3420371420b645dde4066c0000a8
mkfs.lustre --fsname=ltfs1 --mdt --index=1 --mgsnode=192.168.19.14 at o2ib1 --mgsnode=192.168.19.15 at o2ib1 --failnode=192.168.19.15 at o2ib1 --reformat --mkfsoptions "-E stride=32" /dev/disk/by-id/scsi-3600b3420371420b645dde5093e0000a9
mkfs.lustre --fsname=ltfs1 --mdt --index=2 --mgsnode=192.168.19.15 at o2ib1 --mgsnode=192.168.19.14 at o2ib1 --failnode=192.168.19.14 at o2ib1 --reformat --mkfsoptions "-E stride=32" /dev/disk/by-id/scsi-3600b3420371420b645dde619060000aa
mkfs.lustre --fsname=ltfs1 --mdt --index=3 --mgsnode=192.168.19.15 at o2ib1 --mgsnode=192.168.19.14 at o2ib1 --failnode=192.168.19.14 at o2ib1 --reformat --mkfsoptions "-E stride=32" /dev/disk/by-id/scsi-3600b3420371420b645dde7367f0000ab
node1
mount -t lustre /dev/disk/by-id/scsi-3600b3420371420b645dde4066c0000a8 /mnt/tfs/mgs
mount -t lustre /dev/disk/by-id/scsi-3600b3420371420b645dde5093e0000a9 /mnt/tfs/mgs1
node2
mount -t lustre /dev/disk/by-id/scsi-3600b3420371420b645dde619060000aa /mnt/tfs/mgs2
mount -t lustre /dev/disk/by-id/scsi-3600b3420371420b645dde7367f0000ab /mnt/tfs/mgs3
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230606/e46405d7/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 5976 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230606/e46405d7/attachment-0001.png>
More information about the lustre-discuss
mailing list