[lustre-discuss] Regarding Lustre with RDMA

Andreas Dilger adilger at whamcloud.com
Thu Jan 5 12:19:10 PST 2023


On Jan 5, 2023, at 04:12, Nick dan via lustre-discuss <lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>> wrote:

We have configured the Lustre filesystem and mounted the same on the client. This is being done using IP as highlighted below

[root at cl01] user]# df -hT
11.11.1.211 at tcp:/lustre lustre   9.2G   42M  8.7G   1% /mnt/lustre

Part 2:
Now for the next part, we don't want to use this command: (mount -t lustre 11.11.1.211 at tcp:/lustre /mnt/lustre) as the connection is through IP.
We want to mount a particular block device as a lustre file system. For example, in the image below we have created 2 partitions on the server of 10G each and we are sharing ost partition to the client as a block device.
when we are sharing the block device. This is our client and we have got 10G of /mnt/ost.

[root at cll01 user]# lsblk
nvme0n1     259:0    0    10G  0 disk

After mounting  using mount /dev/nvme0n1 /mnt/lustre/
The filesystem type is ext4 and not lustre as mentioned below.

[root at cl01] user]# df -hT
/dev/nvme0n1       ext4     9.3G   42M  8.7G   1% /mnt/lustre

We want to mount using the lustre filesystem and not ext4.
Is there a need to change the lnet configuration? What else is need to be done?

What you are doing will not work.  Mounting the shared block device directly as type ext4 will (and probably already has) corrupt the filesystem on /dev/nvme0n1 because the two kernels do not know the device is in use on two nodes.

As a starting point, if you have shared NVMe devices (presumably via NVMeoF) visible on multiple nodes you should enable the "mmp" feature (Multi-Mount Protection) on the unmounted ext4 filesystem like "tune2fs -O mmp /dev/nvme0n1" and it will at least prevent the filesystem from being mounted directly on two nodes at one time.

However, you still cannot mount this device directly on the client.  Lustre is a network filesystem and not a shared-block filesystem.  If you have IB or RoCE you can use OFED/MOFED with the o2ib LND and it will use RDMA to transfer data to/from the OST devices.  With IB RDMA Lustre can have network bandwidth comparable to locally attached NVMe devices, and can also scale far larger than directly-attached storage would allow.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230105/b3da2df6/attachment.htm>


More information about the lustre-discuss mailing list