[Lustre-discuss] File corrupted after fail-overing

Đàm Thanh Tùng tungdt at isds.vn
Thu Jun 18 01:03:14 PDT 2009


Hi everybody
I'm newbie in Lustre and i'm so sorry if my question is too stupid or it
existed elsewhere.
I'm have a problem with Lustre OST fail over
I have 2 OSSs , configured to fail-over together, each OSS have their own
OST ( i didn't use shared disk for my 2 OSS ) and they used the same OST
index
This is all the things i've done:

- With my MDS: mkfs.lustre --verbose --mdt --mgs /dev/sdb
                         mount -t lustre /dev/sdb/ /mnt/lustre
- And my OSSs:

OSS1: mkfs.lustre --ost
--mgsnode=192.168.1.200 at tcp0--failover=192.168.1.202 at tcp0--index=lustre-OST0000
/dev/sdb

mount -t lustre /dev/sdb /mnt/lustre

OSS2: mkfs.lustre --ost
--mgsnode=192.168.1.200 at tcp0--failover=192.168.1.201 at tcp0--index=lustre-OST0000
/deb/sdb

mount -t lustre /dev/sdb /mnt/lustre

Everything worked well.

I made my own test:
- Copy a large file to lustre mounted partition in my client, when it's
still writing in there, i umount one of my OSS ( which is receiving data - i
verified it by looking at df -h output on each OSS and lfs getstripe in
client ).
- The fail-overing worked well, at least by everything display in their log
and my MDS log. The copy stopped at the moment, after recovering and
changing connection from MDS to acitve OSS, it continued and finished
without any error.

But, the problem is: When i used md5sum command to verify the file i've just
copied, it's not the same with the original file. I tested many time after
that and found almost the same result.

Is there any way to overcome this problem ?

Any help would be really grateful
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090618/a4fc7bfe/attachment.htm>


More information about the lustre-discuss mailing list