[lustre-discuss] lctl ping node28 at o2ib report Input/output error

Andreas Dilger adilger at whamcloud.com
Wed Jun 27 04:07:37 PDT 2018


On Jun 27, 2018, at 09:12, yu sun <sunyu1949 at gmail.com> wrote:
> 
> client:
> root at ml-gpu-ser200.nmg01:~$ mount -t lustre node28 at o2ib1:node29 at o2ib1:/project /mnt/lustre_data
> mount.lustre: mount node28 at o2ib1:node29 at o2ib1:/project at /mnt/lustre_data failed: Input/output error
> Is the MGS running?
> root at ml-gpu-ser200.nmg01:~$ lctl ping node28 at o2ib1
> failed to ping 10.82.143.202 at o2ib1: Input/output error
> root at ml-gpu-ser200.nmg01:~$
> 
> 
> mgs and mds:
>     mkfs.lustre --mgs --reformat --servicenode=node28 at o2ib1 --servicenode=node29 at o2ib1 /dev/sdb1
>     mkfs.lustre --fsname=project --mdt --index=0 --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1 --servicenode node28 at o2ib1 --servicenode node29 at o2ib1 --reformat --backfstype=ldiskfs /dev/sdc1

Separate from the LNet issues, it is probably worthwhile to point out some issues
with your configuration.  You shouldn't use partitions on the OST and MDT devices
if you want to get maximum performance.  That can offset all of the filesystem IO
from the RAID/sector alignment and hurt performance.

Secondly, it isn't clear if you are using underlying RAID devices, or if you are
configuring each OST on a separate disk?  It looks like the latter - that you are
making each disk a separate OST.  That isn't a good idea for Lustre, since it does
not (yet) have any redundancy at higher layers, and any disk failure would result
in data loss.  You currently need to have RAID-5/6 or ZFS for each OST/MDT, unless
this is a really "scratch" filesystem where you don't care if the data is lost and
reformatting the filesystem is OK (i.e. low cost is the primary goal, which is fine
also, but not very common).

We are working at Lustre-level data redundancy, and there is some support for this
in the 2.11 release, but it is not yet in a state where you could reliably use it
to mirror all of the files in the filesystem.

Cheers, Andreas

> 
> ost:
> ml-storage-ser22.nmg01:
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1 --servicenode=node23 at o2ib1 --ost --index=12 /dev/sdc1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1 --servicenode=node23 at o2ib1 --ost --index=13 /dev/sdd1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1 --servicenode=node23 at o2ib1 --ost --index=14 /dev/sde1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1 --servicenode=node23 at o2ib1 --ost --index=15 /dev/sdf1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1 --servicenode=node23 at o2ib1 --ost --index=16 /dev/sdg1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1 --servicenode=node23 at o2ib1 --ost --index=17 /dev/sdh1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1 --servicenode=node23 at o2ib1 --ost --index=18 /dev/sdi1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1 --servicenode=node23 at o2ib1 --ost --index=19 /dev/sdj1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1 --servicenode=node23 at o2ib1 --ost --index=20 /dev/sdk1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1 --servicenode=node23 at o2ib1 --ost --index=21 /dev/sdl1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1 --servicenode=node23 at o2ib1 --ost --index=22 /dev/sdm1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node22 at o2ib1 --servicenode=node23 at o2ib1 --ost --index=23 /dev/sdn1
> ml-storage-ser26.nmg01:
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1 --servicenode=node27 at o2ib1 --ost --index=36 /dev/sdc1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1 --servicenode=node27 at o2ib1 --ost --index=37 /dev/sdd1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1 --servicenode=node27 at o2ib1 --ost --index=38 /dev/sde1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1 --servicenode=node27 at o2ib1 --ost --index=39 /dev/sdf1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1 --servicenode=node27 at o2ib1 --ost --index=40 /dev/sdg1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1 --servicenode=node27 at o2ib1 --ost --index=41 /dev/sdh1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1 --servicenode=node27 at o2ib1 --ost --index=42 /dev/sdi1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1 --servicenode=node27 at o2ib1 --ost --index=43 /dev/sdj1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1 --servicenode=node27 at o2ib1 --ost --index=44 /dev/sdk1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1 --servicenode=node27 at o2ib1 --ost --index=45 /dev/sdl1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1 --servicenode=node27 at o2ib1 --ost --index=46 /dev/sdm1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28 at o2ib1 --mgsnode=node29 at o2ib1  --servicenode=node26 at o2ib1 --servicenode=node27 at o2ib1 --ost --index=47 /dev/sdn1
> 
> Thanks
> Yu
> 
> Mohr Jr, Richard Frank (Rick Mohr) <rmohr at utk.edu> 于2018年6月27日周三 下午1:25写道:
> 
> > On Jun 27, 2018, at 12:52 AM, yu sun <sunyu1949 at gmail.com> wrote:
> >
> > I have create file /etc/modprobe.d/lustre.conf with content on all mdt ost and client:
> > root at ml-gpu-ser200.nmg01:~$ cat /etc/modprobe.d/lustre.conf
> > options lnet networks="o2ib1(eth3.2)"
> > and I exec command line : lnetctl lnet configure --all to make my static lnet configuration take effect. but i still can't ping node28 from my client ml-gpu-ser200.nmg01.   I can mount  as well as access lustre on  client ml-gpu-ser200.nmg01.
> 
> What options did you use when mounting the file system?
> 
> --
> Rick Mohr
> Senior HPC System Administrator
> National Institute for Computational Sciences
> http://www.nics.tennessee.edu
> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
---
Andreas Dilger
Principal Lustre Architect
Whamcloud







-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 235 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180627/ab15ff98/attachment.sig>


More information about the lustre-discuss mailing list