[Lustre-discuss] Lustre client question
Zachary Beebleson
zbeeble at math.uchicago.edu
Fri May 13 10:39:53 PDT 2011
We recently had two raid rebuilds on a couple storage targets that did not go
according to plan. The cards reported a successful rebuild in each case, but
ldiskfs errors started showing up on the associated OSSs and the effected OSTs
were remounted read-only. We are planning to migrate off the data, but we've
noticed that some clients are getting i/o errors, while others are not. As an
example, a file that has a stripe on at least one affected OST could not be
read on one client, i.e. I received a read-error trying to access it, while it
was perfectly readable and apparently uncorrupted on another (I am able to
migrate the file to healthy OSTs by copying to a new file name). The clients
with the i/o problem see inactive devices corresponding to the read-only OSTs
when I issue a 'lfs df', while the others without the i/o problems report the
targets as normal. Is it just that many clients are not aware of an OST problem
yet? I need clients with minimal I/O disruptions in order to migrate as much
data off as possible.
A client reboot appears to awaken them to the fact that there are problems with
the OSTs. However, I need them to be able to read the data in order to migrate
it off. Is there a way to reconnect the clients to the problematic OSTs?
We have dd-ed copies of the OSTs to try e2fsck against them, but the results
were not promising. The check aborted with:
------
Resize inode (re)creation failed: A block group is missing an inode
table.Continue? yes
ext2fs_read_inode: A block group is missing an inode table while reading inode
7 in recreate inode
e2fsck: aborted
------
Any advice would be greatly appreciated.
Zach
More information about the lustre-discuss
mailing list