[Lustre-discuss] Lustre-discuss Digest, Vol 46, Issue 33

Dam Thanh Tung tungdt at isds.vn
Wed Nov 18 14:33:38 PST 2009


On Thu, Nov 19, 2009 at 2:00 AM, <lustre-discuss-request at lists.lustre.org>wrote:

> Send Lustre-discuss mailing list submissions to
>        lustre-discuss at lists.lustre.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://lists.lustre.org/mailman/listinfo/lustre-discuss
> or, via email, send a message with subject or body 'help' to
>        lustre-discuss-request at lists.lustre.org
>
> You can reach the person managing the list at
>        lustre-discuss-owner at lists.lustre.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Lustre-discuss digest..."
>
>
> Today's Topics:
>
>   1. MDS doesn't switch to failover OST node (Dam Thanh Tung)
>   2. Re: MDS doesn't switch to failover OST node (Brian J. Murrell)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 18 Nov 2009 22:54:28 +0700
> From: Dam Thanh Tung <tungdt at isds.vn>
> Subject: [Lustre-discuss] MDS doesn't switch to failover OST node
> To: lustre-discuss at lists.lustre.org
> Message-ID:
>        <a119d1570911180754i3ee81f30wad5a0dd1cdb47e05 at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi list
>
> I am encountering a problem with OST-MDS connecting. Because of RAID card
> hanging, our OST went down this morning and when i tried to mount the faill
> over node of that OST, problem occurred :
>
> MDS only sent request to the OST which was down and didn't connect to our
> backup (failover) OST, so our backup solution was useless, we lost all data
> from that OST. It's really a disaster for me because we even lost all of
> our
> data before with the same kind of problem: OST can't connect to MDS !!!!
>
> We use drbd between OSTs to synchronize data. The backup (failover node)
> was
> mounted successfully without any error but didn't have any client to
> recover
> like this:
>
> cat /proc/fs/lustre/obdfilter/lustre-OST0006/recovery_status
> status: RECOVERING
> recovery_start: 0
> time_remaining: 0
> connected_clients: 0/1
> delayed_clients: 0/1
> completed_clients: 0/1
> replayed_requests: 0*/??*
> queued_requests: 0
> next_transno: 30064771073
>
> In MDS's message log, we only saw the connection to our dead OST:
>
> Nov 18 22:44:03 MDS1 kernel: Lustre: Request x1314965674069373 sent from
> lustre-OST0006-osc to NID 192.168.1.66 at tcp 56s ago has timed out (limit
> 56s).
> ......
>
> The output of* **lctl dl *command from MDS
>
> lctl dl
>  0 UP mgs MGS MGS 25
>  1 UP mgc MGC192.168.1.78 at tcp 0681a267-849f-350c-5b2c-6869c794550f 5
>  2 UP mdt MDS MDS_uuid 3
>  3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4
>  4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 15
>  5 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5
>  6 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5
>  7 IN osc lustre-OST0006-osc lustre-mdtlov_UUID 5
>  8 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5
>  9 UP osc lustre-OST0005-osc lustre-mdtlov_UUID 5
>
> I did activated OST6 ( lctl --device 7 activate ) but it couldn't help
>
>
>
> Could anyone tell me how to route MDS to connect to our backup OST ( with
> ip
> address 192.168.1.67 , for example ) ? , to bring our OST up ?
>
> Any help would be really appreciated !
>
> Hope that i can receive your answers or suggestions as soon as possible
>
> Best Regards
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091118/5b0a96ce/attachment-0001.html
>
> ------------------------------
>
> Message: 2
> Date: Wed, 18 Nov 2009 11:10:51 -0500
> From: "Brian J. Murrell" <Brian.Murrell at Sun.COM>
> Subject: Re: [Lustre-discuss] MDS doesn't switch to failover OST node
> To: lustre-discuss at lists.lustre.org
> Message-ID: <1258560651.30445.59.camel at pc.interlinx.bc.ca>
> Content-Type: text/plain; charset="utf-8"
>
> On Wed, 2009-11-18 at 22:54 +0700, Dam Thanh Tung wrote:
> > Hi list
>
> Hi,
>
> > MDS only sent request to the OST which was down and didn't connect to
> > our backup (failover) OST, so our backup solution was useless, we lost
> > all data from that OST.
>

Hi Brian

Thank you for you fast reply

>
> I don't think you have actually lost any data.  It's there.  Your
> clients (which the MDS is) just don't know to use the failover OSS that
> you have set up (but not told Lustre about).
>
> > It's really a disaster for me because we even lost all of our data
> > before with the same kind of problem: OST can't connect to MDS !!!!
>
> Failures to connect between nodes does not result in data loss.  The
> data is still there.  You just need to have your clients access it.
>
>

I know that data is still there but i refer to "lost" when i no longer can
access it anymore.

In our client, we mounted with parameter like this:

mount -t lustre -o flock 192.168.1.78 at tcp:192.168.1.80 at tcp:/lustre
/mnt/lustre/

We didn't umount our client, just deactivate the dead OST and after mouting
the backup one, we activated it, but because MDS coudn't connect and receive
any information from the backup ( failover ) OST, clients are the same.



> > Could anyone tell me how to route MDS to connect to our backup OST
> > ( with ip address 192.168.1.67 , for example ) ? , to bring our OST
> > up ?
>
> It sounds like you need to review the failover section of the manual.
>
> In summary, you need to tell the clients about failover nodes
> (--failnode) when you create the filesystem.  You can add this feature
> after-the-fact with tunefs.lustre.
>

In our OST, before it goes down because of RAID card hanging, we made it by:


  mkfs.lustre --ost --mgsnode=192.168.1.78 at tcp
--mgsnode=192.168.1.80 at tcp--failover=192.168.1.66 at tcp--index=6
--verbose --writeconf /dev/drbd6

Could you please give some suggestions ? Do i need to provide some
information ?

Many thanks

>
> b.
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: application/pgp-signature
> Size: 197 bytes
> Desc: This is a digitally signed message part
> Url :
> http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091118/f1c497e1/attachment-0001.bin
>
> ------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
> End of Lustre-discuss Digest, Vol 46, Issue 33
> **********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20091119/76e384e7/attachment.htm>


More information about the lustre-discuss mailing list