[lustre-discuss] Problems on mds/mgs

Mohr Jr, Richard Frank (Rick Mohr) rmohr at utk.edu
Wed Apr 22 05:58:38 PDT 2015


Do you have any osts formatted/mounted?  I believe you will get false errors if you mount the mdt without osts, but I can't remember exactly if it's the same error you are seeing.

-- Rick

> On Apr 22, 2015, at 7:55 AM, Sven Schumacher <schumacher at tfd.uni-hannover.de> wrote:
> 
> Hello,
> 
> I always get the following error, when doing the things described below:
>> LustreError: 11-0: BIGWORK-MDT0000-lwp-MDT0000: Communicating with
>> 0 at lo, operation mds_connect failed with -11.
> If anyone has a helping hint... I'm up for it...
> 
> Thanks in advance
> 
> Sven
> 
> 
> what I do have: 4 servers for lustre with 2 infiniband-ports
> (ConnectX-mellanox-Cards)
> Infiniband is configured on mds:
>> 6: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
>> state UP qlen 256
>>    link/infiniband
>> 80:00:00:48:fe:80:00:00:00:00:00:00:f4:52:14:03:00:57:e1:c1 brd
>> 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
>>    inet 10.69.100.5/24 brd 10.69.100.255 scope global ib0
>> 7: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
>> state UP qlen 256
>>    link/infiniband
>> 80:00:00:49:fe:80:00:00:00:00:00:00:f4:52:14:03:00:57:e1:c2 brd
>> 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
> 
> What I would like to have:
> MDS/MGS on one server
> OSS on 3 servers, each with 2 OST.
> 
> I did the following on mds:
>> # mkfs.lustre --fsname=BIGWORK --mgs --mdt --index=0
>> --mgsnode=10.69.100.5 at o2ib0 --reformat /dev/vg_mds/mdsmgs
>>   Permanent disk data:
>> Target:     BIGWORK:MDT0000
>> Index:      0
>> Lustre FS:  BIGWORK
>> Mount type: ldiskfs
>> Flags:      0x65
>>              (MDT MGS first_time update )
>> Persistent mount opts: user_xattr,errors=remount-ro
>> Parameters: mgsnode=10.69.100.5 at o2ib
>> 
>> device size = 1116156MB
>> formatting backing filesystem ldiskfs on /dev/vg_mds/mdsmgs
>>        target name  BIGWORK:MDT0000
>>        4k blocks     285735936
>>        options        -J size=400 -I 512 -i 2048 -q -O
>> dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E
>> lazy_journal_init -F
>> mkfs_cmd = mke2fs -j -b 4096 -L BIGWORK:MDT0000  -J size=400 -I 512 -i
>> 2048 -q -O
>> dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E
>> lazy_journal_init -F /dev/vg_mds/mdsmgs 285735936
>> Writing CONFIGS/mountdata
> 
> And dmesg shows:
>> LDISKFS-fs (dm-2): mounted filesystem with ordered data mode.
>> quota=on. Opts:
> 
> But "mount" doesn't show any lustre-filesystem mounted, so I do:
>> # mount -t lustre /dev/vg_mds/mdsmgs /lustre/mdsmgs
>> mount.lustre: set /sys/block/dm-2/queue/max_sectors_kb to 127
>> 
>> mount.lustre: set /sys/block/dm-1/queue/max_sectors_kb to 127
>> 
>> mount.lustre: set /sys/block/dm-0/queue/max_sectors_kb to 127
>> 
>> mount.lustre: set /sys/block/sdc/queue/max_sectors_kb to 32767
>> 
>> mount.lustre: set /sys/block/sdd/queue/max_sectors_kb to 32767
>> 
>> mount.lustre: set /sys/block/sde/queue/max_sectors_kb to 32767
>> 
>> mount.lustre: set /sys/block/sdf/queue/max_sectors_kb to 32767
> 
> Now dmesg shows:
>> 
>> LDISKFS-fs (dm-2): mounted filesystem with ordered data mode.
>> quota=on. Opts:
>> LNet: HW CPU cores: 24, npartitions: 4
>> padlock: VIA PadLock Hash Engine not detected.
>> Lustre: Lustre: Build Version:
>> 2.5.3.90--CHANGED-2.6.32-431.23.3.el6.lustre
>> LNet: Added LNI 10.69.100.5 at o2ib [8/256/0/180]
>> LDISKFS-fs (dm-2): mounted filesystem with ordered data mode.
>> quota=on. Opts:
>> Lustre: ctl-BIGWORK-MDT0000: No data found on store. Initialize space
>> Lustre: BIGWORK-MDT0000: new disk, initializing
>> LustreError: 11-0: BIGWORK-MDT0000-lwp-MDT0000: Communicating with
>> 0 at lo, operation mds_connect failed with -11.
> 
> So whats possibly wrong here?
> 
> lsmod lists the following modules (which belong to lustre):
>> Module                  Size  Used by
>> osp                   242759  1
>> mdd                   284205  3
>> lfsck                 103130  4
>> lod                   263636  3
>> mdt                   746013  4
>> mgs                   281619  1
>> mgc                    82367  2
>> fsfilt_ldiskfs          5865  1
>> osd_ldiskfs           452528  4
>> lquota                345916  11
>> lustre                919263  0
>> mdc                   201643  1
>> lov                   514967  1
>> osc                   392643  1
>> fid                    82230  9
>> fld                    84131  8
>> ko2iblnd              239245  1
>> ptlrpc               1665273  16
>> obdclass             1263221  77
>> lvfs                   16685  19
>> lnet                  344978  4
>> sha512_generic          5198  0
>> sha256_generic         10425  0
>> crc32c_intel            2015  0
>> libcfs                495892  21
>> ldiskfs               425708  3
> 
> 
> 
> lsmod lists the following modules (which belong to infiniband):
>> ib_ipoib               80756  0
>> ib_srp                 32208  0
>> scsi_transport_srp      5487  1
>> rdma_ucm               16185  0
>> rdma_cm                38340  2
>> ib_addr                 6606  2
>> iw_cm                   8657  1
>> ib_uverbs              34909  1
>> ib_cm                  36936  3
>> ipv6                  319905  2
>> ib_umad                11686  0
>> mlx4_ib               126642  0
>> ib_sa                  24113  6
>> ib_mad                 39070  4
>> ib_core                74419  12
>> mlx4_core             212574  1
> 
> 
> 
> 
> -- 
> Sven Schumacher - Systemadministrator Tel: (0511)762-2753
> Leibniz Universitaet Hannover
> Institut für Turbomaschinen und Fluid-Dynamik       - TFD
> Appelstraße 9 - 30167 Hannover
> Institut für Kraftwerkstechnik und Wärmeübertragung - IKW
> Callinstraße 36 - 30167 Hannover
> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


More information about the lustre-discuss mailing list