[Lustre-discuss] Lustre-discuss Digest, Vol 46, Issue 33

Dam Thanh Tung tungdt at isds.vn
Wed Nov 18 23:27:16 PST 2009


I tried using tunefs.lustre to re-set failover parameter for my OST (
although, from dryrun tunefs.lustre output, i saw those parameter ) but it
couldn't help. Anyone else has any idea?

Thank you in advance !!!!

On Thu, Nov 19, 2009 at 5:33 AM, Dam Thanh Tung <tungdt at isds.vn> wrote:

> On Thu, Nov 19, 2009 at 2:00 AM, <lustre-discuss-request at lists.lustre.org>wrote:
>
>> Send Lustre-discuss mailing list submissions to
>>        lustre-discuss at lists.lustre.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>        http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> or, via email, send a message with subject or body 'help' to
>>        lustre-discuss-request at lists.lustre.org
>>
>> You can reach the person managing the list at
>>        lustre-discuss-owner at lists.lustre.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Lustre-discuss digest..."
>>
>>
>> Today's Topics:
>>
>>   1. MDS doesn't switch to failover OST node (Dam Thanh Tung)
>>   2. Re: MDS doesn't switch to failover OST node (Brian J. Murrell)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Wed, 18 Nov 2009 22:54:28 +0700
>> From: Dam Thanh Tung <tungdt at isds.vn>
>> Subject: [Lustre-discuss] MDS doesn't switch to failover OST node
>> To: lustre-discuss at lists.lustre.org
>> Message-ID:
>>        <a119d1570911180754i3ee81f30wad5a0dd1cdb47e05 at mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Hi list
>>
>> I am encountering a problem with OST-MDS connecting. Because of RAID card
>> hanging, our OST went down this morning and when i tried to mount the
>> faill
>> over node of that OST, problem occurred :
>>
>> MDS only sent request to the OST which was down and didn't connect to our
>> backup (failover) OST, so our backup solution was useless, we lost all
>> data
>> from that OST. It's really a disaster for me because we even lost all of
>> our
>> data before with the same kind of problem: OST can't connect to MDS !!!!
>>
>> We use drbd between OSTs to synchronize data. The backup (failover node)
>> was
>> mounted successfully without any error but didn't have any client to
>> recover
>> like this:
>>
>> cat /proc/fs/lustre/obdfilter/lustre-OST0006/recovery_status
>> status: RECOVERING
>> recovery_start: 0
>> time_remaining: 0
>> connected_clients: 0/1
>> delayed_clients: 0/1
>> completed_clients: 0/1
>> replayed_requests: 0*/??*
>> queued_requests: 0
>> next_transno: 30064771073
>>
>> In MDS's message log, we only saw the connection to our dead OST:
>>
>> Nov 18 22:44:03 MDS1 kernel: Lustre: Request x1314965674069373 sent from
>> lustre-OST0006-osc to NID 192.168.1.66 at tcp 56s ago has timed out (limit
>> 56s).
>> ......
>>
>> The output of* **lctl dl *command from MDS
>>
>> lctl dl
>>  0 UP mgs MGS MGS 25
>>  1 UP mgc MGC192.168.1.78 at tcp 0681a267-849f-350c-5b2c-6869c794550f 5
>>  2 UP mdt MDS MDS_uuid 3
>>  3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4
>>  4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 15
>>  5 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5
>>  6 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5
>>  7 IN osc lustre-OST0006-osc lustre-mdtlov_UUID 5
>>  8 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5
>>  9 UP osc lustre-OST0005-osc lustre-mdtlov_UUID 5
>>
>> I did activated OST6 ( lctl --device 7 activate ) but it couldn't help
>>
>>
>>
>> Could anyone tell me how to route MDS to connect to our backup OST ( with
>> ip
>> address 192.168.1.67 , for example ) ? , to bring our OST up ?
>>
>> Any help would be really appreciated !
>>
>> Hope that i can receive your answers or suggestions as soon as possible
>>
>> Best Regards
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>> http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091118/5b0a96ce/attachment-0001.html
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Wed, 18 Nov 2009 11:10:51 -0500
>> From: "Brian J. Murrell" <Brian.Murrell at Sun.COM>
>> Subject: Re: [Lustre-discuss] MDS doesn't switch to failover OST node
>> To: lustre-discuss at lists.lustre.org
>> Message-ID: <1258560651.30445.59.camel at pc.interlinx.bc.ca>
>> Content-Type: text/plain; charset="utf-8"
>>
>> On Wed, 2009-11-18 at 22:54 +0700, Dam Thanh Tung wrote:
>> > Hi list
>>
>> Hi,
>>
>> > MDS only sent request to the OST which was down and didn't connect to
>> > our backup (failover) OST, so our backup solution was useless, we lost
>> > all data from that OST.
>>
>
> Hi Brian
>
> Thank you for you fast reply
>
>>
>> I don't think you have actually lost any data.  It's there.  Your
>> clients (which the MDS is) just don't know to use the failover OSS that
>> you have set up (but not told Lustre about).
>>
>> > It's really a disaster for me because we even lost all of our data
>> > before with the same kind of problem: OST can't connect to MDS !!!!
>>
>> Failures to connect between nodes does not result in data loss.  The
>> data is still there.  You just need to have your clients access it.
>>
>>
>
> I know that data is still there but i refer to "lost" when i no longer can
> access it anymore.
>
> In our client, we mounted with parameter like this:
>
> mount -t lustre -o flock 192.168.1.78 at tcp:192.168.1.80 at tcp:/lustre
> /mnt/lustre/
>
> We didn't umount our client, just deactivate the dead OST and after mouting
> the backup one, we activated it, but because MDS coudn't connect and receive
> any information from the backup ( failover ) OST, clients are the same.
>
>
>
>> > Could anyone tell me how to route MDS to connect to our backup OST
>> > ( with ip address 192.168.1.67 , for example ) ? , to bring our OST
>> > up ?
>>
>> It sounds like you need to review the failover section of the manual.
>>
>> In summary, you need to tell the clients about failover nodes
>> (--failnode) when you create the filesystem.  You can add this feature
>> after-the-fact with tunefs.lustre.
>>
>
> In our OST, before it goes down because of RAID card hanging, we made it
> by:
>
>   mkfs.lustre --ost --mgsnode=192.168.1.78 at tcp --mgsnode=192.168.1.80 at tcp--failover=192.168.1.66 at tcp--index=6 --verbose --writeconf /dev/drbd6
>
> Could you please give some suggestions ? Do i need to provide some
> information ?
>
> Many thanks
>
>>
>> b.
>>
>> -------------- next part --------------
>> A non-text attachment was scrubbed...
>> Name: not available
>> Type: application/pgp-signature
>> Size: 197 bytes
>> Desc: This is a digitally signed message part
>> Url :
>> http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091118/f1c497e1/attachment-0001.bin
>>
>> ------------------------------
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>> End of Lustre-discuss Digest, Vol 46, Issue 33
>> **********************************************
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20091119/9f99fced/attachment.htm>


More information about the lustre-discuss mailing list