[lustre-discuss] Problems on mds/mgs
Sven Schumacher
schumacher at tfd.uni-hannover.de
Wed Apr 22 04:54:56 PDT 2015
Hello,
I always get the following error, when doing the things described below:
> LustreError: 11-0: BIGWORK-MDT0000-lwp-MDT0000: Communicating with
> 0 at lo, operation mds_connect failed with -11.
If anyone has a helping hint... I'm up for it...
Thanks in advance
Sven
what I do have: 4 servers for lustre with 2 infiniband-ports
(ConnectX-mellanox-Cards)
Infiniband is configured on mds:
> 6: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
> state UP qlen 256
> link/infiniband
> 80:00:00:48:fe:80:00:00:00:00:00:00:f4:52:14:03:00:57:e1:c1 brd
> 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
> inet 10.69.100.5/24 brd 10.69.100.255 scope global ib0
> 7: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
> state UP qlen 256
> link/infiniband
> 80:00:00:49:fe:80:00:00:00:00:00:00:f4:52:14:03:00:57:e1:c2 brd
> 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
>
What I would like to have:
MDS/MGS on one server
OSS on 3 servers, each with 2 OST.
I did the following on mds:
> # mkfs.lustre --fsname=BIGWORK --mgs --mdt --index=0
> --mgsnode=10.69.100.5 at o2ib0 --reformat /dev/vg_mds/mdsmgs
> Permanent disk data:
> Target: BIGWORK:MDT0000
> Index: 0
> Lustre FS: BIGWORK
> Mount type: ldiskfs
> Flags: 0x65
> (MDT MGS first_time update )
> Persistent mount opts: user_xattr,errors=remount-ro
> Parameters: mgsnode=10.69.100.5 at o2ib
>
> device size = 1116156MB
> formatting backing filesystem ldiskfs on /dev/vg_mds/mdsmgs
> target name BIGWORK:MDT0000
> 4k blocks 285735936
> options -J size=400 -I 512 -i 2048 -q -O
> dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E
> lazy_journal_init -F
> mkfs_cmd = mke2fs -j -b 4096 -L BIGWORK:MDT0000 -J size=400 -I 512 -i
> 2048 -q -O
> dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E
> lazy_journal_init -F /dev/vg_mds/mdsmgs 285735936
> Writing CONFIGS/mountdata
And dmesg shows:
> LDISKFS-fs (dm-2): mounted filesystem with ordered data mode.
> quota=on. Opts:
But "mount" doesn't show any lustre-filesystem mounted, so I do:
> # mount -t lustre /dev/vg_mds/mdsmgs /lustre/mdsmgs
> mount.lustre: set /sys/block/dm-2/queue/max_sectors_kb to 127
>
> mount.lustre: set /sys/block/dm-1/queue/max_sectors_kb to 127
>
> mount.lustre: set /sys/block/dm-0/queue/max_sectors_kb to 127
>
> mount.lustre: set /sys/block/sdc/queue/max_sectors_kb to 32767
>
> mount.lustre: set /sys/block/sdd/queue/max_sectors_kb to 32767
>
> mount.lustre: set /sys/block/sde/queue/max_sectors_kb to 32767
>
> mount.lustre: set /sys/block/sdf/queue/max_sectors_kb to 32767
Now dmesg shows:
>
> LDISKFS-fs (dm-2): mounted filesystem with ordered data mode.
> quota=on. Opts:
> LNet: HW CPU cores: 24, npartitions: 4
> padlock: VIA PadLock Hash Engine not detected.
> Lustre: Lustre: Build Version:
> 2.5.3.90--CHANGED-2.6.32-431.23.3.el6.lustre
> LNet: Added LNI 10.69.100.5 at o2ib [8/256/0/180]
> LDISKFS-fs (dm-2): mounted filesystem with ordered data mode.
> quota=on. Opts:
> Lustre: ctl-BIGWORK-MDT0000: No data found on store. Initialize space
> Lustre: BIGWORK-MDT0000: new disk, initializing
> LustreError: 11-0: BIGWORK-MDT0000-lwp-MDT0000: Communicating with
> 0 at lo, operation mds_connect failed with -11.
So whats possibly wrong here?
lsmod lists the following modules (which belong to lustre):
> Module Size Used by
> osp 242759 1
> mdd 284205 3
> lfsck 103130 4
> lod 263636 3
> mdt 746013 4
> mgs 281619 1
> mgc 82367 2
> fsfilt_ldiskfs 5865 1
> osd_ldiskfs 452528 4
> lquota 345916 11
> lustre 919263 0
> mdc 201643 1
> lov 514967 1
> osc 392643 1
> fid 82230 9
> fld 84131 8
> ko2iblnd 239245 1
> ptlrpc 1665273 16
> obdclass 1263221 77
> lvfs 16685 19
> lnet 344978 4
> sha512_generic 5198 0
> sha256_generic 10425 0
> crc32c_intel 2015 0
> libcfs 495892 21
> ldiskfs 425708 3
lsmod lists the following modules (which belong to infiniband):
> ib_ipoib 80756 0
> ib_srp 32208 0
> scsi_transport_srp 5487 1
> rdma_ucm 16185 0
> rdma_cm 38340 2
> ib_addr 6606 2
> iw_cm 8657 1
> ib_uverbs 34909 1
> ib_cm 36936 3
> ipv6 319905 2
> ib_umad 11686 0
> mlx4_ib 126642 0
> ib_sa 24113 6
> ib_mad 39070 4
> ib_core 74419 12
> mlx4_core 212574 1
--
Sven Schumacher - Systemadministrator Tel: (0511)762-2753
Leibniz Universitaet Hannover
Institut für Turbomaschinen und Fluid-Dynamik - TFD
Appelstraße 9 - 30167 Hannover
Institut für Kraftwerkstechnik und Wärmeübertragung - IKW
Callinstraße 36 - 30167 Hannover
More information about the lustre-discuss
mailing list