[Lustre-discuss] MDS read-only
wanglu at ihep.ac.cn
Mon Oct 8 21:04:08 PDT 2012
Two of our MDS have got repeatedly read-only error recently after once e2fsck on lustre 1.8.5. After the MDT mounted for a while, the kernel will reports errors like:
Oct 8 20:16:44 mainmds kernel: LDISKFS-fs error (device cciss!c0d1): ldiskfs_ext_check_inode: bad header/extent in inode #50736178: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)
Oct 8 20:16:44 mainmds kernel: Aborting journal on device cciss!c0d1-8.
And make the MDS read-only.
This problem has made about 1PB data, 0.1 billion files unavailable to access. We believe there is some structure wrong in the local file system of MDT, so we have tried to use e2fsck to fix it follow the process in lustre manual. However, with the loop always goes like this:
1. run e2fsck, fixed or not fixed some errors
2. mount MDT, report read-only after some client operations, and the whole system became unusable.
3. e2fsck again.
We have tried with three different version lustre: 1.8.5, 1.8.6, and 1.8.8-wc and their corresponding e2fsprog, the problem still exists. Currently, We can only use lustre with all the clients mounted in read-only mode, and tried to copy the whole file system. However, It takes a long period to generate all the directory structure and file list for 0.1 billion files.
Can any one give us some suggestions? Thank you very much!
More information about the lustre-discuss