[lustre-discuss] unable to connect clients to OST before tunefs
Patricia Santos Marco
psantos at bifi.es
Thu May 19 05:18:54 PDT 2016
Hello Lustre team,
We have an older production cluster with 1.8.1 , with 1 MDT/MDS server and
two OST/OSD servers. The network card of one of the OSD servers (lxsrv3)
crashed (eth0), and we put a new one (eth5). Then we changed
modprobe.conf.local of the server:
options lnet networks="tcp0(eth5)"
And restarted the server, but the lustre file system was down. so I follow
the manual to regenerate configuration logs:
1- Shutdown the file system in this order.
Unmount the clients.
Unmount the MDT.
Unmount all OSTs.
.2- On the MDT, run:
lxsrv4:~ # tunefs.lustre --writeconf /dev/sda1
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata
Read previous values:
Target: luster-MDT0000
Index: 0
Lustre FS: luster
Mount type: ldiskfs
Flags: 0x5
(MDT MGS )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr,acl
Parameters: mdt.quota_type=ug2 mdt.group_upcall=/usr/sbin/l_getgroups
Permanent disk data:
Target: luster-MDT0000
Index: 0
Lustre FS: luster
Mount type: ldiskfs
Flags: 0x105
(MDT MGS writeconf )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr,acl
Parameters: mdt.quota_type=ug2 mdt.group_upcall=/usr/sbin/l_getgroups
Writing CONFIGS/mountdata
3. On the OSTs, run:
lxsrv1:
tunefs.lustre --writeconf /dev/sda
tunefs.lustre --writeconf /dev/sdb
tunefs.lustre --writeconf /dev/sdc
lxsrv3: ( this has been the failed server)
tunefs.lustre --writeconf
/dev/disk/by-id/scsi-3600605b000a79eb011b131b81830ddc0
tunefs.lustre --writeconf
/dev/disk/by-id/scsi-3600605b000a79eb011bfa4ad1861dd29
tunefs.lustre --writeconf
/dev/disk/by-id/scsi-3600605b000a79eb011b131b81831b1de
4. Restarted the file system in this order.
Mount the MGS
Mount the MDT.
Mount the OSTs.
5- Now the MDT can see all OST. However, when I start a client:
node43:~ # lfs df
UUID 1K-blocks Used Available Use% Mounted on
luster-MDT0000_UUID 255466784 4921796 235945520 1% /lustre[MDT:0]
luster-OST0003_UUID 1152952228 527535012 566850548 45% /lustre[OST:3]
luster-OST0004_UUID 1152952228 607482548 486903012 52% /lustre[OST:4]
luster-OST0005_UUID 1152952228 660191440 434194120 57% /lustre[OST:5]
filesystem summary: 3458856684 1795209000 1487947680 51% /lustre
The client can't connect with the OST0000 , OST0001 and OST0002.:
lctl ping 192.168.1.249 at tcp
failed to ping 192.168.1.44 at tcp: Input/output error
lctl dl
0 UP mgc MGC192.168.1.248 at tcp 1d2aa343-6ae6-f6d9-0637-7c10b52a0569 5
1 UP lov luster-clilov-ffff81046e4be400
34862c9c-2cfc-2de0-5e6d-247dd4953a13 4
2 UP mdc luster-MDT0000-mdc-ffff81046e4be400
34862c9c-2cfc-2de0-5e6d-247dd4953a13 5
3 UP osc luster-OST0003-osc-ffff81046e4be400
34862c9c-2cfc-2de0-5e6d-247dd4953a13 5
4 UP osc luster-OST0004-osc-ffff81046e4be400
34862c9c-2cfc-2de0-5e6d-247dd4953a13 5
5 UP osc luster-OST0005-osc-ffff81046e4be400
34862c9c-2cfc-2de0-5e6d-247dd4953a13 5
6 UP osc luster-OST0000-osc-ffff81046e4be400
34862c9c-2cfc-2de0-5e6d-247dd4953a13 5
7 UP osc luster-OST0001-osc-ffff81046e4be400
34862c9c-2cfc-2de0-5e6d-247dd4953a13 5
8 UP osc luster-OST0002-osc-ffff81046e4be400
34862c9c-2cfc-2de0-5e6d-247dd4953a13 5
9 UP lov luster-clilov-ffff81046ed3a400
fc806d37-990e-5e36-0952-1abf92e6b2cc 4
10 UP mdc luster-MDT0000-mdc-ffff81046ed3a400
fc806d37-990e-5e36-0952-1abf92e6b2cc 5
11 UP osc luster-OST0003-osc-ffff81046ed3a400
fc806d37-990e-5e36-0952-1abf92e6b2cc 5
12 UP osc luster-OST0004-osc-ffff81046ed3a400
fc806d37-990e-5e36-0952-1abf92e6b2cc 5
13 UP osc luster-OST0005-osc-ffff81046ed3a400
fc806d37-990e-5e36-0952-1abf92e6b2cc 5
14 UP osc luster-OST0000-osc-ffff81046ed3a400
fc806d37-990e-5e36-0952-1abf92e6b2cc 5
15 UP osc luster-OST0001-osc-ffff81046ed3a400
fc806d37-990e-5e36-0952-1abf92e6b2cc 5
16 UP osc luster-OST0002-osc-ffff81046ed3a400
fc806d37-990e-5e36-0952-1abf92e6b2cc 5
There are two lustre file systems!!!
If I umount lustre:
umount /lustre
lctl dl
0 UP mgc MGC192.168.1.248 at tcp 31d0a0ed-7df2-7bc6-1b3c-d799685e1e1a 5
1 UP lov luster-clilov-ffff81047039b800
2f0d17b1-906c-59c1-2086-29970887c33f 4
2 UP mdc luster-MDT0000-mdc-ffff81047039b800
2f0d17b1-906c-59c1-2086-29970887c33f 5
3 UP osc luster-OST0003-osc-ffff81047039b800
2f0d17b1-906c-59c1-2086-29970887c33f 5
4 UP osc luster-OST0004-osc-ffff81047039b800
2f0d17b1-906c-59c1-2086-29970887c33f 5
5 UP osc luster-OST0005-osc-ffff81047039b800
2f0d17b1-906c-59c1-2086-29970887c33f 5
6 UP osc luster-OST0000-osc-ffff81047039b800
2f0d17b1-906c-59c1-2086-29970887c33f 5
7 UP osc luster-OST0001-osc-ffff81047039b800
2f0d17b1-906c-59c1-2086-29970887c33f 5
8 UP osc luster-OST0002-osc-ffff81047039b800
2f0d17b1-906c-59c1-2086-29970887c33f 5
14 ST osc luster-OST0000-osc-ffff8104702b2400
d1a75733-a7c4-c7b4-b410-f6daaccf1869 2
15 ST osc luster-OST0001-osc-ffff8104702b2400
d1a75733-a7c4-c7b4-b410-f6daaccf1869 2
16 ST osc luster-OST0002-osc-ffff8104702b2400
d1a75733-a7c4-c7b4-b410-f6daaccf1869 2
lfs df (anything)
I umount lustre again:
lctl dl
6 ST osc luster-OST0000-osc-ffff81047039b800
2f0d17b1-906c-59c1-2086-29970887c33f 2
7 ST osc luster-OST0001-osc-ffff81047039b800
2f0d17b1-906c-59c1-2086-29970887c33f 2
8 ST osc luster-OST0002-osc-ffff81047039b800
2f0d17b1-906c-59c1-2086-29970887c33f 2
14 ST osc luster-OST0000-osc-ffff81046f414c00
802a6b2d-6954-b924-5556-1f3112b030f3 2
15 ST osc luster-OST0001-osc-ffff81046f414c00
802a6b2d-6954-b924-5556-1f3112b030f3 2
16 ST osc luster-OST0002-osc-ffff81046f414c00
802a6b2d-6954-b924-5556-1f3112b030f3 2
And if I run a ping in the OST server:
lxsrv3> lctl ping 192.168.1.44 at tcp
And I I mount again lustre in the client:
lctl dl
0 UP mgc MGC192.168.1.248 at tcp dd51f58e-c580-b2eb-18f8-6f950aa272bb 5
1 UP lov luster-clilov-ffff810467f2d000
3beb05fe-eaa5-1b21-e416-6caa965f2d1a 4
2 UP mdc luster-MDT0000-mdc-ffff810467f2d000
3beb05fe-eaa5-1b21-e416-6caa965f2d1a 5
3 UP osc luster-OST0003-osc-ffff810467f2d000
3beb05fe-eaa5-1b21-e416-6caa965f2d1a 5
4 UP osc luster-OST0004-osc-ffff810467f2d000
3beb05fe-eaa5-1b21-e416-6caa965f2d1a 5
5 UP osc luster-OST0005-osc-ffff810467f2d000
3beb05fe-eaa5-1b21-e416-6caa965f2d1a 5
6 UP osc luster-OST0000-osc-ffff810467f2d000
3beb05fe-eaa5-1b21-e416-6caa965f2d1a 5
7 UP osc luster-OST0001-osc-ffff810467f2d000
3beb05fe-eaa5-1b21-e416-6caa965f2d1a 5
8 UP osc luster-OST0002-osc-ffff810467f2d000
3beb05fe-eaa5-1b21-e416-6caa965f2d1a 5
lfs df
UUID 1K-blocks Used Available Use% Mounted on
luster-MDT0000_UUID 255466784 4921812 235945504 1% /lustre[MDT:0]
luster-OST0000_UUID 2880829872 1395149424 1339342592 48% /lustre[OST:0]
luster-OST0001_UUID 2880829872 1323430808 1411061208 45% /lustre[OST:1]
luster-OST0002_UUID 2880829872 1348980996 1385511020 46% /lustre[OST:2]
luster-OST0003_UUID 1152952228 527535024 566850536 45% /lustre[OST:3]
luster-OST0004_UUID 1152952228 607482556 486903004 52% /lustre[OST:4]
luster-OST0005_UUID 1152952228 660191444 434194116 57% /lustre[OST:5]
filesystem summary: 12101346300 5862770252 5623862476 48% /lustre
And voalá , lustre works again. However, If I reboot the client, the
problem goes back.
What should do to fix it??
Thanks!!
--
--------------------------------------------------------
Patricia Santos Marco
HPC research group System Administrator
Instituto de Biocomputación y Física de Sistemas Complejos (BIFI)
Universidad de Zaragoza
e-mail: psantos at bifi.es <artginer at bifi.es>
phone: (+34) 976762992
http://bifi.es/~patricia/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20160519/37de0b61/attachment.htm>
More information about the lustre-discuss
mailing list