[Lustre-discuss] Using drbd: reformat disk or only sync ?

Dam Thanh Tung tungdt at isds.vn
Sat Nov 21 07:34:38 PST 2009



On Sat, Nov 21, 2009 at 3:25 PM, Andreas Dilger <adilger at sun.com> wrote:
On 2009-11-20, at 19:36, Dam Thanh Tung wrote:
We just started drbd from OST (which has been rebuild RAID partition)  
and connect with drbd on an working OST. Everything was fine and the  
synchronization completed without any error report. But, when we mount  
this backup OST in to our system, some of web client can't connect to  
it ( MDS and some others can ) and after a short time, in that OST  
message log, we see many error report like this :

Nov 19 19:59:36 OST6 kernel: LDISKFS-fs error (device drbd6):  
ldiskfs_lookup: unlinked inode 159588368 in dir #261333022

Nov 19 19:59:36 OST6 kernel: LustreError: 3893:0:(filter_lvb.c: 
90:filter_lvbo_init()) lustre-OST0006: bad object 996598/0: rc -2

It sounds to me like you are trying to mount the "backup OST" at the  
same time as the "primary OST"?  That is definitely NOT how Lustre  
works.  You should stop that, as it will cause serious filesystem  
corruption if you are doing that.

The backup OST should only be mounted when the primary has failed  
(preferably when the primary is powered down via STONITH so that there  
is no chance it will still modify the filesystem).  This is normally  
controlled by HA software like Heartbeat or similar.

Thank you for your fast reply, Andreas

Maybe because i explained not really clearly, so you are  
misunderstanding me.

I only mounted our backup OST when my primary OST went down, and it  
showed me those error report

In order to using drbd as back up solution as i described above, do we  
need to reformat disk before synchronize data or just sync it directly?


I haven't used DRBD myself, but I believe that it should NOT require  
formatting a device before using DRBD on it.  However, there would  
need to be an initial synchronization to copy all of the data from the  
primary copy to the backup.  DRBD is just doing a block-level copy of  
one device to another, it doesn't know anything about the filesystem.

If i don't need to re-format before synchronizing data,could you  
please tell me why did we have those error?  The synchronization  
completed successfully !  ( we synchronize data from primary OST to  
backup OST, i when i mount the backup OST, get those errors - our  
primary OST contains data but it can't connect to our MDS, because of  
it, we have to use backup OST like i described   before in this list )

Everything is going worse and worse. Hope you can help me bring the  
data back. If you need more information, i'll send you in detail.

Many thanks


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20091121/ab48db22/attachment.htm>


More information about the lustre-discuss mailing list