[lustre-discuss] Problems on mds/mgs

Sven Schumacher schumacher at tfd.uni-hannover.de
Wed Apr 22 04:54:56 PDT 2015


Hello,

I always get the following error, when doing the things described below:
> LustreError: 11-0: BIGWORK-MDT0000-lwp-MDT0000: Communicating with
> 0 at lo, operation mds_connect failed with -11.
If anyone has a helping hint... I'm up for it...

Thanks in advance

Sven


what I do have: 4 servers for lustre with 2 infiniband-ports
(ConnectX-mellanox-Cards)
Infiniband is configured on mds:
> 6: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
> state UP qlen 256
>     link/infiniband
> 80:00:00:48:fe:80:00:00:00:00:00:00:f4:52:14:03:00:57:e1:c1 brd
> 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
>     inet 10.69.100.5/24 brd 10.69.100.255 scope global ib0
> 7: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
> state UP qlen 256
>     link/infiniband
> 80:00:00:49:fe:80:00:00:00:00:00:00:f4:52:14:03:00:57:e1:c2 brd
> 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
>

What I would like to have:
MDS/MGS on one server
OSS on 3 servers, each with 2 OST.

I did the following on mds:
> # mkfs.lustre --fsname=BIGWORK --mgs --mdt --index=0
> --mgsnode=10.69.100.5 at o2ib0 --reformat /dev/vg_mds/mdsmgs
>    Permanent disk data:
> Target:     BIGWORK:MDT0000
> Index:      0
> Lustre FS:  BIGWORK
> Mount type: ldiskfs
> Flags:      0x65
>               (MDT MGS first_time update )
> Persistent mount opts: user_xattr,errors=remount-ro
> Parameters: mgsnode=10.69.100.5 at o2ib
>
> device size = 1116156MB
> formatting backing filesystem ldiskfs on /dev/vg_mds/mdsmgs
>         target name  BIGWORK:MDT0000
>         4k blocks     285735936
>         options        -J size=400 -I 512 -i 2048 -q -O
> dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E
> lazy_journal_init -F
> mkfs_cmd = mke2fs -j -b 4096 -L BIGWORK:MDT0000  -J size=400 -I 512 -i
> 2048 -q -O
> dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E
> lazy_journal_init -F /dev/vg_mds/mdsmgs 285735936
> Writing CONFIGS/mountdata

And dmesg shows:
> LDISKFS-fs (dm-2): mounted filesystem with ordered data mode.
> quota=on. Opts: 

But "mount" doesn't show any lustre-filesystem mounted, so I do:
> # mount -t lustre /dev/vg_mds/mdsmgs /lustre/mdsmgs
> mount.lustre: set /sys/block/dm-2/queue/max_sectors_kb to 127
>
> mount.lustre: set /sys/block/dm-1/queue/max_sectors_kb to 127
>
> mount.lustre: set /sys/block/dm-0/queue/max_sectors_kb to 127
>
> mount.lustre: set /sys/block/sdc/queue/max_sectors_kb to 32767
>
> mount.lustre: set /sys/block/sdd/queue/max_sectors_kb to 32767
>
> mount.lustre: set /sys/block/sde/queue/max_sectors_kb to 32767
>
> mount.lustre: set /sys/block/sdf/queue/max_sectors_kb to 32767

Now dmesg shows:
>
> LDISKFS-fs (dm-2): mounted filesystem with ordered data mode.
> quota=on. Opts:
> LNet: HW CPU cores: 24, npartitions: 4
> padlock: VIA PadLock Hash Engine not detected.
> Lustre: Lustre: Build Version:
> 2.5.3.90--CHANGED-2.6.32-431.23.3.el6.lustre
> LNet: Added LNI 10.69.100.5 at o2ib [8/256/0/180]
> LDISKFS-fs (dm-2): mounted filesystem with ordered data mode.
> quota=on. Opts:
> Lustre: ctl-BIGWORK-MDT0000: No data found on store. Initialize space
> Lustre: BIGWORK-MDT0000: new disk, initializing
> LustreError: 11-0: BIGWORK-MDT0000-lwp-MDT0000: Communicating with
> 0 at lo, operation mds_connect failed with -11.

So whats possibly wrong here?

lsmod lists the following modules (which belong to lustre):
> Module                  Size  Used by
> osp                   242759  1
> mdd                   284205  3
> lfsck                 103130  4
> lod                   263636  3
> mdt                   746013  4
> mgs                   281619  1
> mgc                    82367  2
> fsfilt_ldiskfs          5865  1
> osd_ldiskfs           452528  4
> lquota                345916  11
> lustre                919263  0
> mdc                   201643  1
> lov                   514967  1
> osc                   392643  1
> fid                    82230  9
> fld                    84131  8
> ko2iblnd              239245  1
> ptlrpc               1665273  16
> obdclass             1263221  77
> lvfs                   16685  19
> lnet                  344978  4
> sha512_generic          5198  0
> sha256_generic         10425  0
> crc32c_intel            2015  0
> libcfs                495892  21
> ldiskfs               425708  3



lsmod lists the following modules (which belong to infiniband):
> ib_ipoib               80756  0
> ib_srp                 32208  0
> scsi_transport_srp      5487  1
> rdma_ucm               16185  0
> rdma_cm                38340  2
> ib_addr                 6606  2
> iw_cm                   8657  1
> ib_uverbs              34909  1
> ib_cm                  36936  3
> ipv6                  319905  2
> ib_umad                11686  0
> mlx4_ib               126642  0
> ib_sa                  24113  6
> ib_mad                 39070  4
> ib_core                74419  12
> mlx4_core             212574  1 




-- 
Sven Schumacher - Systemadministrator Tel: (0511)762-2753
Leibniz Universitaet Hannover
Institut für Turbomaschinen und Fluid-Dynamik       - TFD
Appelstraße 9 - 30167 Hannover
Institut für Kraftwerkstechnik und Wärmeübertragung - IKW
Callinstraße 36 - 30167 Hannover



More information about the lustre-discuss mailing list