[Lustre-discuss] OST external journals & an lbug

Hendelman, Rob Rob.Hendelman at magnetar.com
Tue Feb 24 19:38:42 PST 2009


Hi everyone,

We have 2 OSS's each with 5 1TB OST's that share lun's on on our san.

OST0-4 are on server3
OST5-9 are on server4

Each ost is 1TB with an external journal

Server3 crashed HARD (as in it wouldn't post upon power off, wait 30
seconds, power on) and we were told by the vendor that the motherboard
died.

In the meanwhile I attempted to mount the OSTs up on server4.  Server3
was powered off before attempting this (STONITH theory, right?)

I ended up with lots of problems and did end up hitting a few lbug's,
specifically:

LustreError: 11283:0: (tracefile.c:431:libcfs_assertion_failed()) LBUG
LustreError: 8095:0: (tracefile.c:431:libcfs_assssertion_failed()) LBUG

We are running an older lustre version
(lustre-1.6.4.3-2.6.18_53.1.13.el5_lustre.1.6.4.3smp) on Centos 5.2
boxes, with the appropriate matching e2fsck, utilities, etc from the
appropriate download page on the Sun website.

I had major problems getting the remaining lustre server to mount the
new OSTs because of apparent journal problems.  I kept hitting "LDISKFS:
failed to claim external journal device" when trying to mount the OST's
as type ldiskfs.  Trying to mount them as type lustre gave me an error
-22.

The way I fixed it was by taking the following steps:

* fsck /path/to/block/device/of/ost-data (this seemed to pick up the
journal correctly)
* ls -la /path/to/block/device/of/journal-dev of ost-data which gives
output such as:
Brw-rw---- 1 root disk 253, 7 Feb 24 20:31
/path/to/block/device/of/journal-dev
* mount -t ldiskfs -o journal_dev=0xFD07
/path/to/block/device/of/ost-data /mnt/tmp-mt-pt
(FD=253 in hex, 07 = 7 in hex)
* unmount /mnt/tmp-mt-pt
* mount -t lustre /path/to/block/device/of/ost-data
/mnt/normal-mountpoint-of-ost

My questions:
1) Since the mds did not crash, but half the OST's did, do I need to
make any changes to the mds?  

2) Any idea why e2fsck can figure out the journal device automatically
but Lustre cannot ?
(at least until I manually mount/unmount as type ldiskfs and manually
specify the journal major/minor dev numbers)

3) Is the LBUG above fixed in a newer version of lustre?  If there is
not enough information, what steps should I take next time to get you
everything you need?

Thanks,

Rob


The information contained in this message and its attachments 
is intended only for the private and confidential use of the 
intended recipient(s).  If you are not the intended recipient 
(or have received this e-mail in error) please notify the 
sender immediately and destroy this e-mail. Any unauthorized 
copying, disclosure or distribution of the material in this e-
mail is strictly prohibited.



More information about the lustre-discuss mailing list