[Lustre-discuss] the vanishing lustre disks

Ms. Megan Larko dobsonunit at gmail.com
Sat Oct 25 12:14:33 PDT 2008


Happy Saturday!

Here is an odd problem.   My OSS3 computer (running CentOS 5 and
lustre kernel 2.6.18-53.1.13.el5_lustre.1.6.4.3smp) which hosts the
OST disks for my lustre disk /crew8 somehow unmounted all -t lustre
disks and removed all lustre modules and is doing a check of the
hardware (Xstore 16-bay JBOD enclosures connected via LSI-8088 cards
to host OSS3).

The /crewdat disk is still mounted on the clients, but not usable:
>From "cat /etc/mtab | grep lustre"
ic-mds1 at o2ib:/crew8 /crewdat lustre rw 0 0
ic-mds1 at o2ib:/crew3 /crew3 lustre rw 0 0
ic-mds1 at o2ib:/crew2 /crew2 lustre rw 0 0

The other lustre disks are fine.

On the OSS3 OST host of /crew8:
"df"
[root at oss3 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             3.8G  1.4G  2.4G  37% /
none                  7.9G     0  7.9G   0% /dev/shm

"lsmod | grep lustre"  returned nothing.
"modprobe lustre"
Again "lsmod | grep lustre"
lustre                469016  0
lov                   303656  1 lustre
mdc                   137080  1 lustre
ptlrpc                659512  4 lustre,lov,lquota,mdc
obdclass              542200  5 lustre,lov,lquota,mdc,ptlrpc
lnet                  255656  4 lustre,ko2iblnd,ptlrpc,obdclass
lvfs                   84712  6 lustre,lov,lquota,mdc,ptlrpc,obdclass
libcfs                183128  9
lustre,lov,lquota,mdc,ko2iblnd,ptlrpc,obdclass,lnet,lvfs

For the timeperiod of the system working when I left work Friday until
I arrived again Sat. afternoon:
>From /var/log/messages:
Oct  6 20:38:55 oss3 kernel: RAID1 conf printout:
Oct  6 20:38:55 oss3 kernel:  --- wd:2 rd:2
Oct  6 20:38:55 oss3 kernel:  disk 0, wo:0, o:1, dev:md1
Oct  6 20:38:55 oss3 kernel:  disk 1, wo:0, o:1, dev:md2
Oct 25 13:41:53 oss3 kernel: usb 4-2: new low speed USB device using
uhci_hcd and address 2
Oct 25 13:41:54 oss3 kernel: usb 4-2: configuration #1 chosen from 1 choice
Oct 25 13:41:54 oss3 kernel: input: Avocent Dell 03R874 as /class/input/input3
Oct 25 13:41:54 oss3 kernel: input: USB HID v1.10 Keyboard [Avocent
Dell 03R874] on usb-0000:00:1d.2-2
Oct 25 13:41:54 oss3 kernel: input: Avocent Dell 03R874 as /class/input/input4
Oct 25 13:41:54 oss3 kernel: input: USB HID v1.10 Mouse [Avocent Dell
03R874] on usb-0000:00:1d.2-2
Oct 25 13:44:07 oss3 kernel: libcfs: no version for "struct_module"
found: kernel tainted.
Oct 25 13:44:08 oss3 kernel: Lustre: OBD class driver, info at clusterfs.com
Oct 25 13:44:08 oss3 kernel:         Lustre Version: 1.6.4.3
Oct 25 13:44:08 oss3 kernel:         Build Version:
1.6.4.3-19691231190000-PRISTINE-.tmp.lustre-build.4180.kernel.linux-2.6.18-53.1.13.el5_lustre.1.6.4.3.-2.6.18-53.1.13.el5_lustre.1.6.4.3smp
Oct 25 13:44:09 oss3 kernel: Lustre: Added LNI 172.18.0.14 at o2ib [8/64]
Oct 25 13:44:09 oss3 kernel: Lustre: Lustre Client File System;
info at clusterfs.com

And that is the end of the /var/log/messages file; nothing afterwards.
 Other than my connecting a mouse/keybd and monitor to the system, I
see nothing apart from the usual.
And the uptime has been consistent:
[root at oss3 ~]# uptime
 14:35:52 up 21 days, 21:39,  2 users,  load average: 0.00, 0.00, 0.00

NOTE:  The LSI-8088 edge card is doing a sort of a filesystem check on
the JBODS attached to the first LSI card.   The LED's on the front of
six of the JBODS are blinking in accordance with such activity but I
can find no evidence of such activity on the OSS3 computer.   My
limited understanding of these LSI cards is that they will run a check
on the attached hw if need be.   What I do not understand is why were
all of my lustre disks unmounted, even those on other, currently
non-blinking LSI cards?   Why/how were the lustre modules removed from
the kernel?   Why was there nothing the the /var/log/messages file to
indicate this?

Has anyone encountered this sort of behavior before?

megan



More information about the lustre-discuss mailing list