[lustre-discuss] OSTs remounting read-only after ldiskfs journal error

Thu Oct 19 09:23:16 PDT 2017

Recently, I ran into an issue where several of the OSTs on my Lustre file system went read-only.  When I checked the logs, I saw messages like these for several OSTs:

Oct  6 23:27:11 haven-oss2 kernel: LDISKFS-fs: ldiskfs_getblk:834: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata
Oct  6 23:27:11 haven-oss2 kernel: LDISKFS-fs error (device sfa0023): ldiskfs_getblk:834: inode #81: block 688560124: comm ll_ost00_022: journal_dirty_metadata failed: handle type 0 started at line 1723, credits 8/0, errcode -28
Oct  6 23:27:11 haven-oss2 kernel: Aborting journal on device sfa0023-8.
Oct  6 23:27:11 haven-oss2 kernel: LDISKFS-fs (sfa0023): Remounting filesystem read-only
Oct  6 23:27:11 haven-oss2 kernel: LDISKFS-fs error (device sfa0023) in osd_trans_stop:1830: error 28

(This looks a lot like LU-9740)

In an effort to get the file system back up, I unmounted the osts, rebooted the oss servers, and then remounted the osts.  Most of the osts that had gone read-only mounted back up.  There were complaints that the file systems were “clean with errors” and needed fsck, but otherwise they seemed fine.  However, there were two osts that would still fall back to read-only and report errors like this:

Lustre: haven-OST001a: Recovery over after 0:10, of 90 clients 90 recovered and 0 were evicted.
Lustre: haven-OST001a: deleting orphan objects from 0x0:1124076 to 0x0:1124289
LDISKFS-fs: ldiskfs_getblk:834: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata
LDISKFS-fs error (device sfa0027): ldiskfs_getblk:834: inode #81: block 72797184: comm ll_ost00_002: journal_dirty_metadata failed: handle type 0 started at line 1723, credits 8/0, errcode -28
Aborting journal on device sfa0027-8.
LDISKFS-fs (sfa0027): Remounting filesystem read-only
LustreError: 16018:0:(osd_io.c:1679:osd_ldiskfs_write_record()) sfa0027: error reading offset 20480 (block 5): rc = -28
LDISKFS-fs error (device sfa0027) in osd_trans_stop:1830: error 28
LustreError: 16954:0:(osd_handler.c:1553:osd_trans_commit_cb()) transaction @0xffff8807b97ba500 commit error: 2
LustreError: 16954:0:(osd_handler.c:1553:osd_trans_commit_cb()) Skipped 3 previous similar messages
LustreError: 16018:0:(osd_handler.c:1833:osd_trans_stop()) haven-OST001a: failed to stop transaction: rc = -28
LDISKFS-fs warning (device sfa0027): kmmpd:187: kmmpd being stopped since filesystem has been remounted as readonly.
LustreError: 16017:0:(tgt_lastrcvd.c:980:tgt_client_new()) haven-OST000e: Failed to write client lcd at idx 96, rc -30

I ended up unmounting all the osts in the file system and running “e2fsck -fn” on them.  There were no problems reported.  I then ran “e2fsck -fp” on the osts that were “clean with errors” so that the file system state would get reset to “clean”.  When I remounted everything, the same two osts would always go read-only. 

I did some digging with debugfs, and it looks like inode 81 corresponds to the last_rcvd file.  So I am wondering if there might be one of two things happening:

1) The journal is corrupted.  When it tries to replay a transaction that modifies the last_rcvd file, that transaction fails and the journal replay aborts.  (In which case, is there some way to get around a corrupted journal?)

2) The journal is fine, but the last_rcvd file is somehow corrupted which is preventing the transaction from replaying.  (If that is the case, will Lustre regenerate the last_rcvd file if I delete it?)

Of course, it could be that it's neither of those two options.

I am hoping that someone on the mailing list might have some experience with this so they can share their wisdom with me.

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu