[lustre-discuss] lctl ping node28 at o2ib report Input/output error

Wed Jun 27 21:26:52 PDT 2018

yes, drbd will mirror the content of block devices between hosts
synchronously or asynchronously. this will provide us data redundancy
between hosts.
perhaps we should use zfs + drbd for mdt and ost?

Thanks
Yu

Patrick Farrell <paf at cray.com> 于2018年6月27日周三 下午9:28写道：

>
> I’m a little puzzled - it can switch, but isn’t the data on the failed
> disk lost...?  That’s why Andreas is suggesting RAID.  Or is drbd doing
> syncing of the disk?  That seems like a really expensive way to get
> redundancy, since it would have to be full online mirroring with all the
> costs in hardware and resource usage that implies...?
>
> ZFS is not a requirement, it generally performs a bit worse than ldiskfs
> but makes it up with impressive features to improve data integrity and
> related things.  Since it sounds like that’s not a huge concern for you, I
> would stick with ldiskfs.  It will likely be a little faster and is easier
> to set up.
>
> ------------------------------
> *From:* lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on
> behalf of yu sun <sunyu1949 at gmail.com>
> *Sent:* Wednesday, June 27, 2018 8:21:43 AM
> *To:* adilger at whamcloud.com
> *Cc:* lustre-discuss at lists.lustre.org
> *Subject:* Re: [lustre-discuss] lctl ping node28 at o2ib report Input/output
> error
>
> yes， you are right, thanks for your great suggestions.
>
> now we are using glusterfs to store training data for ML, and we begin to
> investigate lustre to instead glusterfs for performance.
>
> Firstly, yes we do want to get maximum perforance, you means we should use
> zfs , for example , not each ost/mdt on a separate partitions, for better
> perforance?
>
> Secondly, we dont use any underlying RAID devices,  and we do configure
> each ost on a separate disk, considering that lustre does not provide disk
> data redundancy, we are use drbd + pacemarker + corosync for data redundancy
> and HA, you can see we have configured --servicenode when mkfs.lustre. I
> dont know how reliable is this solution?  it seems ok for our current test,
> when one disk faild, pacemarker can switch to other ost on the other
> machine automaticly.
>
> we also want to use zfs and I have test zfs by mirror, However, if the
> physical machine down，data on the machine will lost. so we decice use the
> solution listed above.
>
> Now we are testing, and any suggesting is appreciated 😆.
> thanks Andreas.
>
> Your
> Yu
>
>
>
> Andreas Dilger <adilger at whamcloud.com> 于2018年6月27日周三 下午7:07写道：
>
> On Jun 27, 2018, at 09:12, yu sun <sunyu1949 at gmail.com> wrote:
> >
> > client:
> > root at ml-gpu-ser200.nmg01:~$ mount -t lustre node28 at o2ib1:node29 at o2ib1:/project
> /mnt/lustre_data
> > mount.lustre: mount node28 at o2ib1:node29 at o2ib1:/project at
> /mnt/lustre_data failed: Input/output error
> > Is the MGS running?
> > root at ml-gpu-ser200.nmg01:~$ lctl ping node28 at o2ib1
> > failed to ping 10.82.143.202 at o2ib1: Input/output error
> > root at ml-gpu-ser200.nmg01:~$
> >
> >
> > mgs and mds:
> >     mkfs.lustre --mgs --reformat --servicenode=node28 at o2ib1
> --servicenode=node29 at o2ib1 /dev/sdb1
> >     mkfs.lustre --fsname=project --mdt --index=0 --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1 --servicenode node28 at o2ib1 --servicenode
> node29 at o2ib1 --reformat --backfstype=ldiskfs /dev/sdc1
>
> Separate from the LNet issues, it is probably worthwhile to point out some
> issues
> with your configuration.  You shouldn't use partitions on the OST and MDT
> devices
> if you want to get maximum performance.  That can offset all of the
> filesystem IO
> from the RAID/sector alignment and hurt performance.
>
> Secondly, it isn't clear if you are using underlying RAID devices, or if
> you are
> configuring each OST on a separate disk?  It looks like the latter - that
> you are
> making each disk a separate OST.  That isn't a good idea for Lustre, since
> it does
> not (yet) have any redundancy at higher layers, and any disk failure would
> result
> in data loss.  You currently need to have RAID-5/6 or ZFS for each
> OST/MDT, unless
> this is a really "scratch" filesystem where you don't care if the data is
> lost and
> reformatting the filesystem is OK (i.e. low cost is the primary goal,
> which is fine
> also, but not very common).
>
> We are working at Lustre-level data redundancy, and there is some support
> for this
> in the 2.11 release, but it is not yet in a state where you could reliably
> use it
> to mirror all of the files in the filesystem.
>
> Cheers, Andreas
>
> >
> > ost:
> > ml-storage-ser22.nmg01:
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1
> --servicenode=node23 at o2ib1 --ost --index=12 /dev/sdc1
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1
> --servicenode=node23 at o2ib1 --ost --index=13 /dev/sdd1
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1
> --servicenode=node23 at o2ib1 --ost --index=14 /dev/sde1
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1
> --servicenode=node23 at o2ib1 --ost --index=15 /dev/sdf1
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1
> --servicenode=node23 at o2ib1 --ost --index=16 /dev/sdg1
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1
> --servicenode=node23 at o2ib1 --ost --index=17 /dev/sdh1
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1
> --servicenode=node23 at o2ib1 --ost --index=18 /dev/sdi1
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1
> --servicenode=node23 at o2ib1 --ost --index=19 /dev/sdj1
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1
> --servicenode=node23 at o2ib1 --ost --index=20 /dev/sdk1
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1
> --servicenode=node23 at o2ib1 --ost --index=21 /dev/sdl1
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1
> --servicenode=node23 at o2ib1 --ost --index=22 /dev/sdm1
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1
> --servicenode=node23 at o2ib1 --ost --index=23 /dev/sdn1
> > ml-storage-ser26.nmg01:
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1
> --servicenode=node27 at o2ib1 --ost --index=36 /dev/sdc1
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1
> --servicenode=node27 at o2ib1 --ost --index=37 /dev/sdd1
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1
> --servicenode=node27 at o2ib1 --ost --index=38 /dev/sde1
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1
> --servicenode=node27 at o2ib1 --ost --index=39 /dev/sdf1
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1
> --servicenode=node27 at o2ib1 --ost --index=40 /dev/sdg1
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1
> --servicenode=node27 at o2ib1 --ost --index=41 /dev/sdh1
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1
> --servicenode=node27 at o2ib1 --ost --index=42 /dev/sdi1
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1
> --servicenode=node27 at o2ib1 --ost --index=43 /dev/sdj1
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1
> --servicenode=node27 at o2ib1 --ost --index=44 /dev/sdk1
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1
> --servicenode=node27 at o2ib1 --ost --index=45 /dev/sdl1
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1
> --servicenode=node27 at o2ib1 --ost --index=46 /dev/sdm1
> >     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1
> --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1
> --servicenode=node27 at o2ib1 --ost --index=47 /dev/sdn1
> >
> > Thanks
> > Yu
> >
> > Mohr Jr, Richard Frank (Rick Mohr) <rmohr at utk.edu> 于2018年6月27日周三
> 下午1:25写道：
> >
> > > On Jun 27, 2018, at 12:52 AM, yu sun <sunyu1949 at gmail.com> wrote:
> > >
> > > I have create file /etc/modprobe.d/lustre.conf with content on all mdt
> ost and client:
> > > root at ml-gpu-ser200.nmg01:~$ cat /etc/modprobe.d/lustre.conf
> > > options lnet networks="o2ib1(eth3.2)"
> > > and I exec command line : lnetctl lnet configure --all to make my
> static lnet configuration take effect. but i still can't ping node28 from
> my client ml-gpu-ser200.nmg01.   I can mount  as well as access lustre on
> client ml-gpu-ser200.nmg01.
> >
> > What options did you use when mounting the file system?
> >
> > --
> > Rick Mohr
> > Senior HPC System Administrator
> > National Institute for Computational Sciences
> > http://www.nics.tennessee.edu
> >
> > _______________________________________________
> > lustre-discuss mailing list
> > lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
> Cheers, Andreas
> ---
> Andreas Dilger
> Principal Lustre Architect
> Whamcloud
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180628/00fa4ebb/attachment.html>