[lustre-discuss] unable to connect clients to OST before tunefs

Patricia Santos Marco psantos at bifi.es
Thu May 19 05:18:54 PDT 2016


Hello Lustre team,

We have an older production cluster with 1.8.1 , with 1 MDT/MDS server and
two OST/OSD servers. The network card of one of the OSD servers (lxsrv3)
crashed (eth0), and we put a new one (eth5). Then we changed
modprobe.conf.local of the server:

options lnet networks="tcp0(eth5)"

And restarted the server, but the lustre file system was down. so I follow
the manual to regenerate configuration logs:

1-  Shutdown the file system in this order.
   Unmount the clients.
    Unmount the MDT.
     Unmount all OSTs.


.2- On the MDT, run:

lxsrv4:~ # tunefs.lustre --writeconf /dev/sda1


checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

   Read previous values:
Target:     luster-MDT0000
Index:      0
Lustre FS:  luster
Mount type: ldiskfs
Flags:      0x5
              (MDT MGS )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr,acl
Parameters: mdt.quota_type=ug2 mdt.group_upcall=/usr/sbin/l_getgroups


   Permanent disk data:
Target:     luster-MDT0000
Index:      0
Lustre FS:  luster
Mount type: ldiskfs
Flags:      0x105
              (MDT MGS writeconf )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr,acl
Parameters: mdt.quota_type=ug2 mdt.group_upcall=/usr/sbin/l_getgroups

Writing CONFIGS/mountdata

3. On the OSTs, run:

lxsrv1:
tunefs.lustre --writeconf /dev/sda
tunefs.lustre --writeconf /dev/sdb
tunefs.lustre --writeconf /dev/sdc

lxsrv3: ( this has been the failed server)

tunefs.lustre --writeconf
/dev/disk/by-id/scsi-3600605b000a79eb011b131b81830ddc0
tunefs.lustre --writeconf
/dev/disk/by-id/scsi-3600605b000a79eb011bfa4ad1861dd29
tunefs.lustre --writeconf
/dev/disk/by-id/scsi-3600605b000a79eb011b131b81831b1de

4. Restarted the file system in this order.
Mount the MGS
Mount the MDT.
Mount the OSTs.

5- Now the MDT can see all OST. However, when I start a client:

 node43:~ # lfs df
UUID                 1K-blocks      Used Available  Use% Mounted on
luster-MDT0000_UUID  255466784   4921796 235945520    1% /lustre[MDT:0]
luster-OST0003_UUID  1152952228 527535012 566850548   45% /lustre[OST:3]
luster-OST0004_UUID  1152952228 607482548 486903012   52% /lustre[OST:4]
luster-OST0005_UUID  1152952228 660191440 434194120   57% /lustre[OST:5]

filesystem summary:  3458856684 1795209000 1487947680   51% /lustre

The client can't connect with the OST0000 , OST0001 and OST0002.:

lctl ping 192.168.1.249 at tcp
failed to ping 192.168.1.44 at tcp: Input/output error



lctl dl
  0 UP mgc MGC192.168.1.248 at tcp 1d2aa343-6ae6-f6d9-0637-7c10b52a0569 5
  1 UP lov luster-clilov-ffff81046e4be400
34862c9c-2cfc-2de0-5e6d-247dd4953a13 4
  2 UP mdc luster-MDT0000-mdc-ffff81046e4be400
34862c9c-2cfc-2de0-5e6d-247dd4953a13 5
  3 UP osc luster-OST0003-osc-ffff81046e4be400
34862c9c-2cfc-2de0-5e6d-247dd4953a13 5
  4 UP osc luster-OST0004-osc-ffff81046e4be400
34862c9c-2cfc-2de0-5e6d-247dd4953a13 5
  5 UP osc luster-OST0005-osc-ffff81046e4be400
34862c9c-2cfc-2de0-5e6d-247dd4953a13 5
  6 UP osc luster-OST0000-osc-ffff81046e4be400
34862c9c-2cfc-2de0-5e6d-247dd4953a13 5
  7 UP osc luster-OST0001-osc-ffff81046e4be400
34862c9c-2cfc-2de0-5e6d-247dd4953a13 5
  8 UP osc luster-OST0002-osc-ffff81046e4be400
34862c9c-2cfc-2de0-5e6d-247dd4953a13 5
  9 UP lov luster-clilov-ffff81046ed3a400
fc806d37-990e-5e36-0952-1abf92e6b2cc 4
 10 UP mdc luster-MDT0000-mdc-ffff81046ed3a400
fc806d37-990e-5e36-0952-1abf92e6b2cc 5
 11 UP osc luster-OST0003-osc-ffff81046ed3a400
fc806d37-990e-5e36-0952-1abf92e6b2cc 5
 12 UP osc luster-OST0004-osc-ffff81046ed3a400
fc806d37-990e-5e36-0952-1abf92e6b2cc 5
 13 UP osc luster-OST0005-osc-ffff81046ed3a400
fc806d37-990e-5e36-0952-1abf92e6b2cc 5
 14 UP osc luster-OST0000-osc-ffff81046ed3a400
fc806d37-990e-5e36-0952-1abf92e6b2cc 5
 15 UP osc luster-OST0001-osc-ffff81046ed3a400
fc806d37-990e-5e36-0952-1abf92e6b2cc 5
 16 UP osc luster-OST0002-osc-ffff81046ed3a400
fc806d37-990e-5e36-0952-1abf92e6b2cc 5

There are two lustre file systems!!!

If I umount lustre:


umount /lustre
lctl dl
  0 UP mgc MGC192.168.1.248 at tcp 31d0a0ed-7df2-7bc6-1b3c-d799685e1e1a 5
  1 UP lov luster-clilov-ffff81047039b800
2f0d17b1-906c-59c1-2086-29970887c33f 4
  2 UP mdc luster-MDT0000-mdc-ffff81047039b800
2f0d17b1-906c-59c1-2086-29970887c33f 5
  3 UP osc luster-OST0003-osc-ffff81047039b800
2f0d17b1-906c-59c1-2086-29970887c33f 5
  4 UP osc luster-OST0004-osc-ffff81047039b800
2f0d17b1-906c-59c1-2086-29970887c33f 5
  5 UP osc luster-OST0005-osc-ffff81047039b800
2f0d17b1-906c-59c1-2086-29970887c33f 5
  6 UP osc luster-OST0000-osc-ffff81047039b800
2f0d17b1-906c-59c1-2086-29970887c33f 5
  7 UP osc luster-OST0001-osc-ffff81047039b800
2f0d17b1-906c-59c1-2086-29970887c33f 5
  8 UP osc luster-OST0002-osc-ffff81047039b800
2f0d17b1-906c-59c1-2086-29970887c33f 5
 14 ST osc luster-OST0000-osc-ffff8104702b2400
d1a75733-a7c4-c7b4-b410-f6daaccf1869 2
 15 ST osc luster-OST0001-osc-ffff8104702b2400
d1a75733-a7c4-c7b4-b410-f6daaccf1869 2
 16 ST osc luster-OST0002-osc-ffff8104702b2400
d1a75733-a7c4-c7b4-b410-f6daaccf1869 2

 lfs df (anything)

I umount lustre again:

lctl dl
  6 ST osc luster-OST0000-osc-ffff81047039b800
2f0d17b1-906c-59c1-2086-29970887c33f 2
  7 ST osc luster-OST0001-osc-ffff81047039b800
2f0d17b1-906c-59c1-2086-29970887c33f 2
  8 ST osc luster-OST0002-osc-ffff81047039b800
2f0d17b1-906c-59c1-2086-29970887c33f 2
 14 ST osc luster-OST0000-osc-ffff81046f414c00
802a6b2d-6954-b924-5556-1f3112b030f3 2
 15 ST osc luster-OST0001-osc-ffff81046f414c00
802a6b2d-6954-b924-5556-1f3112b030f3 2
 16 ST osc luster-OST0002-osc-ffff81046f414c00
802a6b2d-6954-b924-5556-1f3112b030f3 2

And if I run a ping in the OST server:

 lxsrv3> lctl ping 192.168.1.44 at tcp

And I I mount again lustre in the client:


 lctl dl
  0 UP mgc MGC192.168.1.248 at tcp dd51f58e-c580-b2eb-18f8-6f950aa272bb 5
  1 UP lov luster-clilov-ffff810467f2d000
3beb05fe-eaa5-1b21-e416-6caa965f2d1a 4
  2 UP mdc luster-MDT0000-mdc-ffff810467f2d000
3beb05fe-eaa5-1b21-e416-6caa965f2d1a 5
  3 UP osc luster-OST0003-osc-ffff810467f2d000
3beb05fe-eaa5-1b21-e416-6caa965f2d1a 5
  4 UP osc luster-OST0004-osc-ffff810467f2d000
3beb05fe-eaa5-1b21-e416-6caa965f2d1a 5
  5 UP osc luster-OST0005-osc-ffff810467f2d000
3beb05fe-eaa5-1b21-e416-6caa965f2d1a 5
  6 UP osc luster-OST0000-osc-ffff810467f2d000
3beb05fe-eaa5-1b21-e416-6caa965f2d1a 5
  7 UP osc luster-OST0001-osc-ffff810467f2d000
3beb05fe-eaa5-1b21-e416-6caa965f2d1a 5
  8 UP osc luster-OST0002-osc-ffff810467f2d000
3beb05fe-eaa5-1b21-e416-6caa965f2d1a 5


lfs df
UUID                 1K-blocks      Used Available  Use% Mounted on
luster-MDT0000_UUID  255466784   4921812 235945504    1% /lustre[MDT:0]
luster-OST0000_UUID  2880829872 1395149424 1339342592   48% /lustre[OST:0]
luster-OST0001_UUID  2880829872 1323430808 1411061208   45% /lustre[OST:1]
luster-OST0002_UUID  2880829872 1348980996 1385511020   46% /lustre[OST:2]
luster-OST0003_UUID  1152952228 527535024 566850536   45% /lustre[OST:3]
luster-OST0004_UUID  1152952228 607482556 486903004   52% /lustre[OST:4]
luster-OST0005_UUID  1152952228 660191444 434194116   57% /lustre[OST:5]

filesystem summary:  12101346300 5862770252 5623862476   48% /lustre

And voalá , lustre works again. However, If I reboot the client, the
problem goes back.
What should do to fix it??

Thanks!!





-- 

--------------------------------------------------------

Patricia Santos Marco

HPC research group System Administrator

Instituto de Biocomputación y Física de Sistemas Complejos (BIFI)

Universidad de Zaragoza

e-mail: psantos at bifi.es <artginer at bifi.es>

phone: (+34) 976762992

http://bifi.es/~patricia/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20160519/37de0b61/attachment.htm>


More information about the lustre-discuss mailing list