[Lustre-discuss] Lustre client question

Fri May 13 12:22:51 PDT 2011

Kevin,

I just failed the drive and remounted. A basic 'df' hangs when it gets to
the mount point, but /proc/fs/lustre/health_check reports everything is
healthy. 'lfs df' on a client reports the OST is active, where it was
inactive before. However, now I'm working with a degraded volume, but it
is raid 6. Should I try another rebuild or just proceed with the
mirgration off of this OST asap?

Thanks,
Zach

PS. Sorry for the repeat message
On Fri, 13 May 2011, Kevin Van Maren wrote:

> See bug 24264 -- certainly possible that the raid controller corrupted your 
> filesystem.
>
> If you remove the new drive and reboot, does the file system look cleaner?
>
> Kevin
>
>
> On May 13, 2011, at 11:39 AM, Zachary Beebleson <zbeeble at math.uchicago.edu> 
> wrote:
>
>> 
>> We recently had two raid rebuilds on a couple storage targets that did not 
>> go
>> according to plan. The cards reported a successful rebuild in each case, 
>> but
>> ldiskfs errors started showing up on the associated OSSs and the effected 
>> OSTs
>> were  remounted read-only. We are planning to migrate off the data, but 
>> we've
>> noticed that some clients are getting i/o errors, while others are not. As 
>> an
>> example, a file that has a stripe on at least one affected OST could not be
>> read on one client, i.e. I received a read-error trying to access it, while 
>> it
>> was perfectly readable and apparently uncorrupted on another (I am able to
>> migrate the file to healthy OSTs by copying to a new file name). The 
>> clients
>> with the i/o problem see inactive devices corresponding to the read-only 
>> OSTs
>> when I issue a 'lfs df', while the others without the i/o problems report 
>> the
>> targets as normal. Is it just that many clients are not aware of an OST 
>> problem
>> yet? I need clients with minimal I/O disruptions in order to migrate as 
>> much
>> data off as possible.
>> 
>> A client reboot appears to awaken them to the fact that there are problems 
>> with
>> the OSTs. However, I need them to be able to read the data in order to 
>> migrate
>> it off. Is there a way to reconnect the clients to the problematic OSTs?
>> 
>> We have dd-ed copies of the OSTs to try e2fsck against them, but the 
>> results
>> were not promising. The check aborted with:
>> 
>> ------
>> Resize inode (re)creation failed: A block group is missing an inode
>> table.Continue? yes
>> 
>> ext2fs_read_inode: A block group is missing an inode table while reading 
>> inode
>> 7 in recreate inode
>> e2fsck: aborted
>> ------
>> 
>> Any advice would be greatly appreciated.
>> Zach
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>