[Lustre-discuss] Reformatting an Existing Lustre OST Causing Problems
Murshid Azman
murshid.azman at gmail.com
Wed Jul 30 02:49:39 PDT 2014
Hello,
I tried to reformat an OST and put it back into Lustre with a new name.
However, now the clients are behaving in a weird way. The clients now seem
to hang on df (takes about a minute to complete)
Here's what I've done chronologically:
1. Deactivated the old OSTs
[root at mds ~]# lctl dl
17 UP osp lustre-OST0004-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5
35 UP osp lustre-OST0005-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5
[root at mds ~]# lctl --device 17 deactivate
[root at mds ~]# lctl --device 35 deactivate
2. Permanently disabled the old OSTs
[root at mds ~]# lctl conf_param lustre-OST0004_UUID.osc.active=0
[root at mds ~]# lctl conf_param lustre-OST0005_UUID.osc.active=0
3. Reformatted the old OSTs into new OSTs with new names
[root at oss19 ~]# umount /lustre/sdb
[root at oss19 ~]# umount /lustre/sdc
[root at oss19 ~]# mkfs.lustre --reformat --ost --mgsnode=192.168.0.1
--fsname=lustre --index=36 /dev/sdb
[root at oss19 ~]# mkfs.lustre --reformat --ost --mgsnode=192.168.0.1
--fsname=lustre --index=37 /dev/sdc
[root at oss19 ~]# mount /lustre/sdb
[root at oss19 ~]# mount /lustre/sdc
After these were done, I'm getting a lot of messages on the clients, while
they hang on df:
Jul 30 10:40:47 client kernel: LustreError: 11-0:
lustre-OST0004-osc-ffff8804384d5c00: Communicating with
192.168.0.19 at tcp, operation ost_connect failed with -19.
Jul 30 10:40:47 client kernel: LustreError: Skipped 49 previous similar messages
Jul 30 10:51:12 client kernel: LustreError: 11-0:
lustre-OST0005-osc-ffff8804384d5c00: Communicating with
192.168.0.19 at tcp, operation ost_connect failed with -19.
Jul 30 10:51:12 client kernel: LustreError: Skipped 49 previous similar messages
A workaround would be by marking the old OSTs as active on the clients,
then mark as inactive again (this does not persist through reboots)
[root at client ~]$ lctl set_param osc.lustre-OST0004-*.active=1
[root at client ~]$ lctl set_param osc.lustre-OST0005-*.active=1
[root at client ~]$ lctl set_param osc.lustre-OST0004-*.active=0
[root at client ~]$ lctl set_param osc.lustre-OST0005-*.active=0
Looking at this issue, I think a writeconf is required on all servers and
clients
http://wiki.lustre.org/manual/LustreManual20_HTML/LustreMaintenance.html#50438199_54623.
Do you foresee any problems running this on our production system?
I'm running Lustre 2.5.0 on servers and clients.
Thanks,
Murshid Azman.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20140730/14d92793/attachment.htm>
More information about the lustre-discuss
mailing list