[lustre-discuss] lustre issue with OST setting to read-only mode as soon as writes are attempted. using Lustre 1.8.8

Colin Faber cfaber at gmail.com
Thu May 7 08:59:35 PDT 2015


Whoops, meant to respond here...

Anyways, it seems something is wrong with sdc2. What's smart tell you? any
notices about it in dmesg?

On Thu, May 7, 2015 at 8:54 AM, Kurt Strosahl <strosahl at jlab.org> wrote:

> Good Morning,
>
>      We recently had an ost encounter an issue with what appears to be its
> journal...  The ost is sitting as a partition atop a raid6 array, which was
> rebuilding due to a failed disk.  The ost has a journal on an external
> mirrored disk.  We unmounted the ost, and ran  the following: e2fsck -y -C
> 0 /dev/sdc2 -j /dev/sdd5
>
>      After that we remounted the ost, and as soon as the first client
> tried to write to it after recover it went back to read-only.  We unmounted
> it again, ran e2fsck again, and again it flipped to read-only the second
> writes tried to go to it (I had set it to read only in the mds, and let it
> sit for a few minutes before setting it back to read/write to make sure
> that it was only on a write that the problem happened).
>
> May  7 10:28:48  kernel:
> May  7 10:28:48  kernel: Aborting journal on device sdd5.
> May  7 10:28:48  kernel: LDISKFS-fs (sdc2): Remounting filesystem read-only
> May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in
> ldiskfs_mb_free_blocks: IO failure
> May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in
> ldiskfs_reserve_inode_write: Journal has aborted
> May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in
> ldiskfs_reserve_inode_write: Journal has aborted
> May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in
> ldiskfs_ext_remove_space: Journal has aborted
> May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in
> ldiskfs_reserve_inode_write: Journal has aborted
> May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in
> ldiskfs_orphan_del: Journal has aborted
> May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in
> ldiskfs_reserve_inode_write: Journal has aborted
> May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in
> ldiskfs_ext_truncate: Journal has aborted
> May  7 10:28:48  kernel: LustreError:
> 2436:0:(filter_log.c:174:filter_recov_log_unlink_cb()) error destroying
> object 2760722: -30
> May  7 10:28:48  kernel: LustreError:
> 2434:0:(llog_cat.c:441:llog_cat_process_thread()) llog_cat_process() failed
> -30
> May  7 10:28:58  kernel: LustreError:
> 8791:0:(fsfilt-ldiskfs.c:501:fsfilt_ldiskfs_brw_start()) can't get handle
> for 47 credits: rc = -30
> May  7 10:28:58  kernel: LustreError:
> 8791:0:(fsfilt-ldiskfs.c:501:fsfilt_ldiskfs_brw_start()) Skipped 54
> previous similar messages
> May  7 10:28:58  kernel: LustreError:
> 8791:0:(filter_io_26.c:705:filter_commitrw_write()) error starting
> transaction: rc = -30
> May  7 10:28:59  kernel: LustreError:
> 5245:0:(fsfilt-ldiskfs.c:367:fsfilt_ldiskfs_start()) error starting handle
> for op 4 (108 credits): rc -30
> May  7 10:28:59  kernel: LustreError:
> 5245:0:(fsfilt-ldiskfs.c:367:fsfilt_ldiskfs_start()) Skipped 18 previous
> similar messages
> May  7 10:29:03  kernel: LustreError:
> 8793:0:(filter_io_26.c:705:filter_commitrw_write()) error starting
> transaction: rc = -30
> May  7 10:29:07  kernel: LustreError:
> 8711:0:(filter_io_26.c:705:filter_commitrw_write()) error starting
> transaction: rc = -30
>
> Kurt J. Strosahl
> System Administrator
> Scientific Computing Group, Thomas Jefferson National Accelerator Facility
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20150507/1b1fb6c4/attachment.htm>


More information about the lustre-discuss mailing list