[Lustre-discuss] OSS not healty

Brian J. Murrell Brian.Murrell at Sun.COM
Thu Mar 13 05:44:45 PDT 2008


On Thu, 2008-03-13 at 12:34 +0100, Frank Mietke wrote:

> okay I've found the following in /var/log/messages before the bulk of above
> messages come. It seems that something with the RAID went wrong.

I don't see anything RAID specific however...

> Mar 13 06:17:31 chic2e24 kernel: [3068633.701448] attempt to access beyond end of device
> Mar 13 06:17:31 chic2e24 kernel: [3068633.701454] sda: rw=1, want=11287722456, limit=7796867072

This is pretty self-explanatory.  Something tried to read beyond the end
of the disk.  Something has a misunderstanding of how big the disk is.
Is it possible that the disk format process was misled about the disk
size during initialization?

Andreas, does mkfs do any bounds checking to verify the sanity of the
mkfs request?  I.e. does it make sure that if/when you specify a number
of blocks for a filesystem that that many block are available?

Frank, is it at all possible that the size of the device had somehow
gotten smaller since you first initialized it?

> Mar 13 06:17:31 chic2e24 kernel: [3068633.701555] attempt to access beyond end of device
> Mar 13 06:17:31 chic2e24 kernel: [3068633.701558] sda: rw=1, want=25366292592, limit=7796867072
> Mar 13 06:17:31 chic2e24 kernel: [3068633.701562] Buffer I/O error on device sda, logical block 3170786573
> Mar 13 06:17:31 chic2e24 kernel: [3068633.701785] lost page write due to I/O error on sda
> Mar 13 06:17:31 chic2e24 kernel: [3068633.702004] Aborting journal on device sda.

This is all just fallout error messages from the attempted read beyond
EOF.

> Mar 13 06:17:31 chic2e24 kernel: [3068633.702226] LustreError: 4493:0:(obd.h:1038:obd_transno_commit_cb()) chicfs-OST0010: transno
> 6510615555435490347 commit error: 2 
> Mar 13 06:17:31 chic2e24 kernel: [3068633.702933] LDISKFS-fs error (device sda) in ldiskfs_reserve_inode_write: Journal has aborted
> Mar 13 06:17:31 chic2e24 kernel: [3068633.703587] Remounting filesystem read-only
> Mar 13 06:17:31 chic2e24 kernel: [3068633.704001] journal commit I/O error
> Mar 13 06:17:31 chic2e24 kernel: [3068633.704981] LDISKFS-fs error (device sda) in ldiskfs_dirty_inode: Journal has aborted

And this is the ldiskfs fallout.

b.





More information about the lustre-discuss mailing list