[Lustre-discuss] MDS read-only
wanglu at ihep.ac.cn
Mon Oct 8 21:23:54 PDT 2012
By the way, we have also tried to dd the MDT device and mount the replica, the problem still exists. Besides, we have not seen any error reported on hardware monitor. It is much more like an ldiskfs error than hardware error.
在 2012-10-9，下午12:04， wanglu 写道：
> Dear all,
> Two of our MDS have got repeatedly read-only error recently after once e2fsck on lustre 1.8.5. After the MDT mounted for a while, the kernel will reports errors like:
> Oct 8 20:16:44 mainmds kernel: LDISKFS-fs error (device cciss!c0d1): ldiskfs_ext_check_inode: bad header/extent in inode #50736178: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)
> Oct 8 20:16:44 mainmds kernel: Aborting journal on device cciss!c0d1-8.
> And make the MDS read-only.
> This problem has made about 1PB data, 0.1 billion files unavailable to access. We believe there is some structure wrong in the local file system of MDT, so we have tried to use e2fsck to fix it follow the process in lustre manual. However, with the loop always goes like this:
> 1. run e2fsck, fixed or not fixed some errors
> 2. mount MDT, report read-only after some client operations, and the whole system became unusable.
> 3. e2fsck again.
> We have tried with three different version lustre: 1.8.5, 1.8.6, and 1.8.8-wc and their corresponding e2fsprog, the problem still exists. Currently, We can only use lustre with all the clients mounted in read-only mode, and tried to copy the whole file system. However, It takes a long period to generate all the directory structure and file list for 0.1 billion files.
> Can any one give us some suggestions? Thank you very much!
> Lu Wang
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
More information about the lustre-discuss