[Lustre-discuss] OST index 0 missing

Ms. Megan Larko dobsonunit at gmail.com
Fri Jul 17 08:01:28 PDT 2009


Hi Again,

This is a follow-on to thread "One Lustre Client lost One Lustre Disk".

I had lost one of two lustre disks after a client reboot.  The disk
itself seemed fine; it just would not mount.  This occurred for two
clients.   I thought that I perhaps needed to update the Lustre disk
information on the MGS being that I had done a successful move of the
non-MGS disk a short while ago and that the restore procedure used a
tunefs.lustre --writeconf.....  command to kind of update the disk.
I documented this in the "One Lustre Client lost One Lustre
Disk--solved" email to the Lustre list.

Well---perhaps not quite so "solved".   My users have noticed some
files on the remounted lustre disk to be inaccessible. What is more
unusual is that the inaccessible file is non-consistent.  It could be
one file now and a different file 10 minutes later.

To start, the system is CentOS 5.1 running 2.6.18-53.1.13.el5 on the
clients and 2.6.18-53.1.13.el5_lustre.1.6.4.3smp on the MGS/MDS and
OSS.  There are no errors in /var/log/messages in either the OSS nor
the MGS/MDS.  I have only my ntpd routine timestamps.

On the Lustre clients, I receive the following errors in
/var/log/messages when  I access the disk I "tunefs.lustre" from
yesterday:

Jul 17 09:56:58 cn2 kernel: LustreError:
13910:0:(lov_ea.c:228:lsm_unpackmd_plain()) OST index 0 missing
Jul 17 09:56:58 cn2 kernel: LustreError:
13910:0:(lov_ea.c:228:lsm_unpackmd_plain()) Skipped 1 previous similar
message
Jul 17 09:56:58 cn2 kernel: Lustre:
13910:0:(lov_pack.c:47:lov_dump_lmm_v1()) objid 0x36716c8, magic
0x0bd10bd0, pattern 0x1
Jul 17 09:56:58 cn2 kernel: Lustre:
13910:0:(lov_pack.c:50:lov_dump_lmm_v1()) stripe_size 1048576,
stripe_count 1
Jul 17 09:56:58 cn2 kernel: Lustre:
13910:0:(lov_pack.c:56:lov_dump_lmm_v1()) stripe 0 idx 0 subobj
0x0/0x78ac47
Jul 17 09:57:00 cn2 kernel: Lustre:
3613:0:(lov_pack.c:47:lov_dump_lmm_v1()) objid 0x36716c8, magic
0x0bd10bd0, pattern 0x1
Jul 17 09:57:00 cn2 kernel: Lustre:
3613:0:(lov_pack.c:50:lov_dump_lmm_v1()) stripe_size 1048576,
stripe_count 1
Jul 17 09:57:00 cn2 kernel: Lustre:
3613:0:(lov_pack.c:56:lov_dump_lmm_v1()) stripe 0 idx 0 subobj
0x0/0x78ac47
Jul 17 09:57:00 cn2 kernel: Lustre:
13912:0:(lov_pack.c:47:lov_dump_lmm_v1()) objid 0x36716c8, magic
0x0bd10bd0, pattern 0x1
Jul 17 09:57:00 cn2 kernel: Lustre:
13912:0:(lov_pack.c:50:lov_dump_lmm_v1()) stripe_size 1048576,
stripe_count 1
Jul 17 09:57:00 cn2 kernel: Lustre:
13912:0:(lov_pack.c:56:lov_dump_lmm_v1()) stripe 0 idx 0 subobj
0x0/0x78ac47

A Google search showed the error message "OST index 0 missing" related
to a missing piece of OST hw in a Lustre disk.  That is not the case
here.  Even checking the the RAID array on the OSS for hw errors shows
none listed.

I am attaching two screen shots (abrfc.png and ncdump.png) which show
a file that is not present and then is present a few minutes later
while a different file is not accessible.   This seems to be only
effecting a small number of the total files on the lustre disk
particularly text files.

Is this situation about "OST index 0 missing" fixable?   If yes, how?
 Should I mount the Lustre disk read-only for now?


Any advice is genuinely appreciated.

Megan Larko

Any
-------------- next part --------------
A non-text attachment was scrubbed...
Name: abrfc.png
Type: image/png
Size: 1083331 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090717/55f9d97e/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ncdump.png
Type: image/png
Size: 973329 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090717/55f9d97e/attachment-0001.png>


More information about the lustre-discuss mailing list