[Lustre-discuss] Reformatting an Existing Lustre OST Causing Problems

Murshid Azman murshid.azman at gmail.com
Wed Jul 30 02:49:39 PDT 2014


Hello,

I tried to reformat an OST and put it back into Lustre with a new name.
However, now the clients are behaving in a weird way. The clients now seem
to hang on df (takes about a minute to complete)

Here's what I've done chronologically:

1. Deactivated the old OSTs

[root at mds ~]# lctl dl
 17 UP osp lustre-OST0004-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5
 35 UP osp lustre-OST0005-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5
[root at mds ~]# lctl --device 17 deactivate
[root at mds ~]# lctl --device 35 deactivate

2. Permanently disabled the old OSTs

[root at mds ~]# lctl conf_param lustre-OST0004_UUID.osc.active=0
[root at mds ~]# lctl conf_param lustre-OST0005_UUID.osc.active=0

3. Reformatted the old OSTs into new OSTs with new names

[root at oss19 ~]# umount /lustre/sdb
[root at oss19 ~]# umount /lustre/sdc
[root at oss19 ~]# mkfs.lustre --reformat --ost --mgsnode=192.168.0.1
--fsname=lustre --index=36 /dev/sdb
[root at oss19 ~]# mkfs.lustre --reformat --ost --mgsnode=192.168.0.1
--fsname=lustre --index=37 /dev/sdc
[root at oss19 ~]# mount /lustre/sdb
[root at oss19 ~]# mount /lustre/sdc

After these were done, I'm getting a lot of messages on the clients, while
they hang on df:

Jul 30 10:40:47 client kernel: LustreError: 11-0:
lustre-OST0004-osc-ffff8804384d5c00: Communicating with
192.168.0.19 at tcp, operation ost_connect failed with -19.
Jul 30 10:40:47 client kernel: LustreError: Skipped 49 previous similar messages
Jul 30 10:51:12 client kernel: LustreError: 11-0:
lustre-OST0005-osc-ffff8804384d5c00: Communicating with
192.168.0.19 at tcp, operation ost_connect failed with -19.
Jul 30 10:51:12 client kernel: LustreError: Skipped 49 previous similar messages

A workaround would be by marking the old OSTs as active on the clients,
then mark as inactive again (this does not persist through reboots)

[root at client ~]$ lctl set_param osc.lustre-OST0004-*.active=1
[root at client ~]$ lctl set_param osc.lustre-OST0005-*.active=1
[root at client ~]$ lctl set_param osc.lustre-OST0004-*.active=0
[root at client ~]$ lctl set_param osc.lustre-OST0005-*.active=0

Looking at this issue, I think a writeconf is required on all servers and
clients
http://wiki.lustre.org/manual/LustreManual20_HTML/LustreMaintenance.html#50438199_54623.
Do you foresee any problems running this on our production system?

I'm running Lustre 2.5.0 on servers and clients.

Thanks,
Murshid Azman.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20140730/14d92793/attachment.htm>


More information about the lustre-discuss mailing list