<div dir="ltr"><span style="text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">yes， you are right, thanks for your great suggestions.</span><div><br><div><span style="text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">now we are using glusterfs to store training data for ML, and we begin to investigate lustre to instead glusterfs for performance.</span></div><div><br></div><div>Firstly, yes we do want to get maximum perforance, you means we should use zfs , for example , not each ost/mdt on a separate partitions, for better perforance?</div><div><br></div><div>Secondly, we dont use any underlying RAID devices,  and we do configure each ost on a separate disk, considering that lustre does not provide disk data redundancy, we are use drbd + pacemarker + corosync for data <span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">redundancy and HA, you can see we have configured --<span style="text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">servicenode when mkfs.lustre. I dont know how reliable is this solution?  it seems ok for our current test, when one disk faild, pacemarker can switch to other ost on the other machine </span></span>automaticly.</div><div><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span style="text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></span></div><div><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span style="text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">we also want to use zfs and I have test zfs by mirror, However, if the physical machine down，data on the machine will lost. so we decice use the solution listed above.</span></span></div><div><br></div><div>Now we are testing, and any suggesting is appreciated 😆.</div><div>thanks <span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Andreas.</span></div><div><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span style="text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></span></div><div><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span style="text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Your</span></span></div><div><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span style="text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Yu</span></span></div><div><br><div><span style="text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></div></div><br><div class="gmail_quote"><div dir="ltr">Andreas Dilger <<a href="mailto:adilger@whamcloud.com">adilger@whamcloud.com</a>> 于2018年6月27日周三 下午7:07写道：<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Jun 27, 2018, at 09:12, yu sun <<a href="mailto:sunyu1949@gmail.com" target="_blank">sunyu1949@gmail.com</a>> wrote:<br>

> <br>

> client:<br>

> root@ml-gpu-ser200.nmg01:~$ mount -t lustre node28@o2ib1:node29@o2ib1:/project /mnt/lustre_data<br>

> mount.lustre: mount node28@o2ib1:node29@o2ib1:/project at /mnt/lustre_data failed: Input/output error<br>

> Is the MGS running?<br>

> root@ml-gpu-ser200.nmg01:~$ lctl ping node28@o2ib1<br>

> failed to ping 10.82.143.202@o2ib1: Input/output error<br>

> root@ml-gpu-ser200.nmg01:~$<br>

> <br>

> <br>

> mgs and mds:<br>

>     mkfs.lustre --mgs --reformat --servicenode=node28@o2ib1 --servicenode=node29@o2ib1 /dev/sdb1<br>

>     mkfs.lustre --fsname=project --mdt --index=0 --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1 --servicenode node28@o2ib1 --servicenode node29@o2ib1 --reformat --backfstype=ldiskfs /dev/sdc1<br>

<br>

Separate from the LNet issues, it is probably worthwhile to point out some issues<br>

with your configuration.  You shouldn't use partitions on the OST and MDT devices<br>

if you want to get maximum performance.  That can offset all of the filesystem IO<br>

from the RAID/sector alignment and hurt performance.<br>

<br>

Secondly, it isn't clear if you are using underlying RAID devices, or if you are<br>

configuring each OST on a separate disk?  It looks like the latter - that you are<br>

making each disk a separate OST.  That isn't a good idea for Lustre, since it does<br>

not (yet) have any redundancy at higher layers, and any disk failure would result<br>

in data loss.  You currently need to have RAID-5/6 or ZFS for each OST/MDT, unless<br>

this is a really "scratch" filesystem where you don't care if the data is lost and<br>

reformatting the filesystem is OK (i.e. low cost is the primary goal, which is fine<br>

also, but not very common).<br>

<br>

We are working at Lustre-level data redundancy, and there is some support for this<br>

in the 2.11 release, but it is not yet in a state where you could reliably use it<br>

to mirror all of the files in the filesystem.<br>

<br>

Cheers, Andreas<br>

<br>

> <br>

> ost:<br>

> ml-storage-ser22.nmg01:<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=12 /dev/sdc1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=13 /dev/sdd1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=14 /dev/sde1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=15 /dev/sdf1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=16 /dev/sdg1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=17 /dev/sdh1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=18 /dev/sdi1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=19 /dev/sdj1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=20 /dev/sdk1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=21 /dev/sdl1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=22 /dev/sdm1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=23 /dev/sdn1<br>

> ml-storage-ser26.nmg01:<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=36 /dev/sdc1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=37 /dev/sdd1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=38 /dev/sde1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=39 /dev/sdf1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=40 /dev/sdg1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=41 /dev/sdh1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=42 /dev/sdi1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=43 /dev/sdj1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=44 /dev/sdk1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=45 /dev/sdl1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=46 /dev/sdm1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=47 /dev/sdn1<br>

> <br>

> Thanks<br>

> Yu<br>

> <br>

> Mohr Jr, Richard Frank (Rick Mohr) <<a href="mailto:rmohr@utk.edu" target="_blank">rmohr@utk.edu</a>> 于2018年6月27日周三 下午1:25写道：<br>

> <br>

> > On Jun 27, 2018, at 12:52 AM, yu sun <<a href="mailto:sunyu1949@gmail.com" target="_blank">sunyu1949@gmail.com</a>> wrote:<br>

> ><br>

> > I have create file /etc/modprobe.d/lustre.conf with content on all mdt ost and client:<br>

> > root@ml-gpu-ser200.nmg01:~$ cat /etc/modprobe.d/lustre.conf<br>

> > options lnet networks="o2ib1(eth3.2)"<br>

> > and I exec command line : lnetctl lnet configure --all to make my static lnet configuration take effect. but i still can't ping node28 from my client ml-gpu-ser200.nmg01.   I can mount  as well as access lustre on  client ml-gpu-ser200.nmg01.<br>

> <br>

> What options did you use when mounting the file system?<br>

> <br>

> --<br>

> Rick Mohr<br>

> Senior HPC System Administrator<br>

> National Institute for Computational Sciences<br>

> <a href="http://www.nics.tennessee.edu" rel="noreferrer" target="_blank">http://www.nics.tennessee.edu</a><br>

> <br>

> _______________________________________________<br>

> lustre-discuss mailing list<br>

> <a href="mailto:lustre-discuss@lists.lustre.org" target="_blank">lustre-discuss@lists.lustre.org</a><br>

> <a href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org" rel="noreferrer" target="_blank">http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><br>

<br>

Cheers, Andreas<br>

---<br>

Andreas Dilger<br>

Principal Lustre Architect<br>

Whamcloud<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

</blockquote></div></div></div>