<div dir="ltr">yes, drbd will mirror the content of block devices between hosts synchronously or asynchronously. this will provide us data <span style="text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">redundancy between hosts.</span><div>perhaps we should use zfs + drbd for mdt and ost?<div><span style="color:rgb(0,0,0);font-family:"Open Sans",HelveticaNeue,"Helvetica Neue",Helvetica,Arial,sans-serif;font-size:15px;text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"></span></span></div></div><div><br></div><div>Thanks</div><div>Yu</div></div><br><div class="gmail_quote"><div dir="ltr">Patrick Farrell <<a href="mailto:paf@cray.com">paf@cray.com</a>> 于2018年6月27日周三 下午9:28写道：<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div>

<br>

I’m a little puzzled - it can switch, but isn’t the data on the failed disk lost...?  That’s why Andreas is suggesting RAID.  Or is drbd doing syncing of the disk?  That seems like a really expensive way to get redundancy, since it would have to be full online

 mirroring with all the costs in hardware and resource usage that implies...?<br>

<br>

ZFS is not a requirement, it generally performs a bit worse than ldiskfs but makes it up with impressive features to improve data integrity and related things.  Since it sounds like that’s not a huge concern for you, I would stick with ldiskfs.  It will likely

 be a little faster and is easier to set up.<br>

<br>

<hr style="display:inline-block;width:98%">

<div id="m_1665947129306950335divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> lustre-discuss <<a href="mailto:lustre-discuss-bounces@lists.lustre.org" target="_blank">lustre-discuss-bounces@lists.lustre.org</a>> on behalf of yu sun <<a href="mailto:sunyu1949@gmail.com" target="_blank">sunyu1949@gmail.com</a>><br>

<b>Sent:</b> Wednesday, June 27, 2018 8:21:43 AM<br>

<b>To:</b> <a href="mailto:adilger@whamcloud.com" target="_blank">adilger@whamcloud.com</a><br>

<b>Cc:</b> <a href="mailto:lustre-discuss@lists.lustre.org" target="_blank">lustre-discuss@lists.lustre.org</a><br>

<b>Subject:</b> Re: [lustre-discuss] lctl ping node28@o2ib report Input/output error</font>

<div> </div>

</div>


<div>

<div dir="ltr"><span style="float:none;display:inline">yes， you are right, thanks for your great suggestions.</span>

<div><br>

<div><span style="float:none;display:inline">now we are using glusterfs to store training data for ML, and we begin to investigate lustre to instead glusterfs for performance.</span></div>

<div><br>

</div>

<div>Firstly, yes we do want to get maximum perforance, you means we should use zfs , for example , not each ost/mdt on a separate partitions, for better perforance?</div>

<div><br>

</div>

<div>Secondly, we dont use any underlying RAID devices,  and we do configure each ost on a separate disk, considering that lustre does not provide disk data redundancy, we are use drbd + pacemarker + corosync for data

<span style="background-color:rgb(255,255,255);float:none;display:inline">redundancy and HA, you can see we have configured --<span style="float:none;display:inline">servicenode when mkfs.lustre. I dont know how reliable is this solution?  it seems ok for

 our current test, when one disk faild, pacemarker can switch to other ost on the other machine </span></span>automaticly.</div>

<div><span style="background-color:rgb(255,255,255);float:none;display:inline"><span style="float:none;display:inline"><br>

</span></span></div>

<div><span style="background-color:rgb(255,255,255);float:none;display:inline"><span style="float:none;display:inline">we also want to use zfs and I have test zfs by mirror, However, if the physical machine down，data on the machine will lost. so we decice

 use the solution listed above.</span></span></div>

<div><br>

</div>

<div>Now we are testing, and any suggesting is appreciated 😆.</div>

<div>thanks <span style="background-color:rgb(255,255,255);float:none;display:inline">

Andreas.</span></div>

<div><span style="background-color:rgb(255,255,255);float:none;display:inline"><span style="float:none;display:inline"><br>

</span></span></div>

<div><span style="background-color:rgb(255,255,255);float:none;display:inline"><span style="float:none;display:inline">Your</span></span></div>

<div><span style="background-color:rgb(255,255,255);float:none;display:inline"><span style="float:none;display:inline">Yu</span></span></div>

<div><br>

<div><span style="float:none;display:inline"><br>

</span></div>

</div>

<br>

<div class="m_1665947129306950335x_gmail_quote">

<div dir="ltr">Andreas Dilger <<a href="mailto:adilger@whamcloud.com" target="_blank">adilger@whamcloud.com</a>> 于2018年6月27日周三 下午7:07写道：<br>

</div>

<blockquote class="m_1665947129306950335x_gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

On Jun 27, 2018, at 09:12, yu sun <<a href="mailto:sunyu1949@gmail.com" target="_blank">sunyu1949@gmail.com</a>> wrote:<br>

> <br>

> client:<br>

> root@ml-gpu-ser200.nmg01:~$ mount -t lustre node28@o2ib1:node29@o2ib1:/project /mnt/lustre_data<br>

> mount.lustre: mount node28@o2ib1:node29@o2ib1:/project at /mnt/lustre_data failed: Input/output error<br>

> Is the MGS running?<br>

> root@ml-gpu-ser200.nmg01:~$ lctl ping node28@o2ib1<br>

> failed to ping 10.82.143.202@o2ib1: Input/output error<br>

> root@ml-gpu-ser200.nmg01:~$<br>

> <br>

> <br>

> mgs and mds:<br>

>     mkfs.lustre --mgs --reformat --servicenode=node28@o2ib1 --servicenode=node29@o2ib1 /dev/sdb1<br>

>     mkfs.lustre --fsname=project --mdt --index=0 --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1 --servicenode node28@o2ib1 --servicenode node29@o2ib1 --reformat --backfstype=ldiskfs /dev/sdc1<br>

<br>

Separate from the LNet issues, it is probably worthwhile to point out some issues<br>

with your configuration.  You shouldn't use partitions on the OST and MDT devices<br>

if you want to get maximum performance.  That can offset all of the filesystem IO<br>

from the RAID/sector alignment and hurt performance.<br>

<br>

Secondly, it isn't clear if you are using underlying RAID devices, or if you are<br>

configuring each OST on a separate disk?  It looks like the latter - that you are<br>

making each disk a separate OST.  That isn't a good idea for Lustre, since it does<br>

not (yet) have any redundancy at higher layers, and any disk failure would result<br>

in data loss.  You currently need to have RAID-5/6 or ZFS for each OST/MDT, unless<br>

this is a really "scratch" filesystem where you don't care if the data is lost and<br>

reformatting the filesystem is OK (i.e. low cost is the primary goal, which is fine<br>

also, but not very common).<br>

<br>

We are working at Lustre-level data redundancy, and there is some support for this<br>

in the 2.11 release, but it is not yet in a state where you could reliably use it<br>

to mirror all of the files in the filesystem.<br>

<br>

Cheers, Andreas<br>

<br>

> <br>

> ost:<br>

> ml-storage-ser22.nmg01:<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=12 /dev/sdc1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=13 /dev/sdd1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=14 /dev/sde1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=15 /dev/sdf1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=16 /dev/sdg1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=17 /dev/sdh1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=18 /dev/sdi1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=19 /dev/sdj1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=20 /dev/sdk1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=21 /dev/sdl1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=22 /dev/sdm1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 --ost --index=23 /dev/sdn1<br>

> ml-storage-ser26.nmg01:<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=36 /dev/sdc1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=37 /dev/sdd1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=38 /dev/sde1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=39 /dev/sdf1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=40 /dev/sdg1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=41 /dev/sdh1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=42 /dev/sdi1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=43 /dev/sdj1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=44 /dev/sdk1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=45 /dev/sdl1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=46 /dev/sdm1<br>

>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 --ost --index=47 /dev/sdn1<br>

> <br>

> Thanks<br>

> Yu<br>

> <br>

> Mohr Jr, Richard Frank (Rick Mohr) <<a href="mailto:rmohr@utk.edu" target="_blank">rmohr@utk.edu</a>> 于2018年6月27日周三 下午1:25写道：<br>

> <br>

> > On Jun 27, 2018, at 12:52 AM, yu sun <<a href="mailto:sunyu1949@gmail.com" target="_blank">sunyu1949@gmail.com</a>> wrote:<br>

> ><br>

> > I have create file /etc/modprobe.d/lustre.conf with content on all mdt ost and client:<br>

> > root@ml-gpu-ser200.nmg01:~$ cat /etc/modprobe.d/lustre.conf<br>

> > options lnet networks="o2ib1(eth3.2)"<br>

> > and I exec command line : lnetctl lnet configure --all to make my static lnet configuration take effect. but i still can't ping node28 from my client ml-gpu-ser200.nmg01.   I can mount  as well as access lustre on  client ml-gpu-ser200.nmg01.<br>

> <br>

> What options did you use when mounting the file system?<br>

> <br>

> --<br>

> Rick Mohr<br>

> Senior HPC System Administrator<br>

> National Institute for Computational Sciences<br>

> <a href="http://www.nics.tennessee.edu" rel="noreferrer" target="_blank">http://www.nics.tennessee.edu</a><br>

> <br>

> _______________________________________________<br>

> lustre-discuss mailing list<br>

> <a href="mailto:lustre-discuss@lists.lustre.org" target="_blank">lustre-discuss@lists.lustre.org</a><br>

> <a href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org" rel="noreferrer" target="_blank">

http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org</a><br>

<br>

Cheers, Andreas<br>

---<br>

Andreas Dilger<br>

Principal Lustre Architect<br>

Whamcloud<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

</blockquote>

</div>

</div>

</div>

</div>

</div>


</blockquote></div>