[Lustre-discuss] OST targets not mountable after disabling/enabling MMP

Edward Walter ewalter at cs.cmu.edu
Mon Aug 9 06:44:31 PDT 2010


Hello List,

We recently experienced a power failure (and subsequent UPS failure) 
which caused our Lustre filesystem to shutdown hard.  We were able to 
bring it back online but started seeing errors where the OSTs were being 
remounted as read-only.  We observed that all of the read-only OSTs were 
reporting an I/O error on the same block (the MMP block) and generating 
the following message:

> Lustre: Server data-OST0004 on device /dev/sdd has started
> end_request: I/O error, dev sdd, sector 861112
> Buffer I/O error on device sdd, logical block 107639
> lost page write due to I/O error on sdd
> LDISKFS-fs error (device sdd): kmmpd: Error writing to MMP block
> end_request: I/O error, dev sdd, sector 0
> Buffer I/O error on device sdd, logical block 0
> lost page write due to I/O error on sdd
> LDISKFS-fs warning (device sdd): kmmpd: kmmpd being stopped since 
> filesystem has been remounted as readonly.
> end_request: I/O error, dev sdd, sector 861112
> Buffer I/O error on device sdd, logical block 107639
> lost page write due to I/O error on sdd
We do have our OSTs setup for failover but were managing the access 
through the shared RAID array itself (using LUN fencing) so we don't 
need the MMP feature.

We disabled MMP using tune2fs (tune2fs -O ^mmp /dev/sdd) on one set of 
OSTs.  When we tried to mount these OSTs we received a message that the 
volume could not be mounted because MMP was not enabled.  We 
subsequently re-enabled MMP (tune2fs -O mmp /dev/sdd).  Oddly this did 
not return a message indicating the MMP interval or block number.  
Running 'tune2fs -l' indicates that MPP is enabled on the volume 
though.  We also observed that OST volumes we disabled MMP on are now 
indicating that MMP is enabled even though we did not re-enable it.

At this point; we can mount the OST targets using ldiskfs in read-only 
mode.  When we attempt to mount them as part of a lustre volume we get 
the following error:
Aug  9 09:25:53 oss-0-25 kernel: LDISKFS-fs warning (device sdd): 
ldiskfs_multi_mount_protect: fsck is running on the filesystem
Aug  9 09:25:53 oss-0-25 kernel: LDISKFS-fs warning (device sdd): 
ldiskfs_multi_mount_protect: MMP failure info: last update time: 
1280954496, last update node: oss-0-25, last update device: /dev/sdd

We're not sure how to proceed at this point.  It seems like all of the 
filesystem objects are present (df reports correct numbers).

Has anyone seen this before and worked their way through getting things 
back online?

Note:
Lustre version = 1.6.6 (using Sun's RPMs)
OS = Centos 5.2
Kernel = 2.6.18-92.1.10.el5_lustre.1.6.6smp

Thanks much.

-Ed Walter
Carnegie Mellon University




More information about the lustre-discuss mailing list