[Lustre-discuss] MDS doesn't switch to failover OST node

Dam Thanh Tung tungdt at isds.vn
Wed Nov 18 07:54:28 PST 2009


Hi list

I am encountering a problem with OST-MDS connecting. Because of RAID card
hanging, our OST went down this morning and when i tried to mount the faill
over node of that OST, problem occurred :

MDS only sent request to the OST which was down and didn't connect to our
backup (failover) OST, so our backup solution was useless, we lost all data
from that OST. It's really a disaster for me because we even lost all of our
data before with the same kind of problem: OST can't connect to MDS !!!!

We use drbd between OSTs to synchronize data. The backup (failover node) was
mounted successfully without any error but didn't have any client to recover
like this:

cat /proc/fs/lustre/obdfilter/lustre-OST0006/recovery_status
status: RECOVERING
recovery_start: 0
time_remaining: 0
connected_clients: 0/1
delayed_clients: 0/1
completed_clients: 0/1
replayed_requests: 0*/??*
queued_requests: 0
next_transno: 30064771073

In MDS's message log, we only saw the connection to our dead OST:

Nov 18 22:44:03 MDS1 kernel: Lustre: Request x1314965674069373 sent from
lustre-OST0006-osc to NID 192.168.1.66 at tcp 56s ago has timed out (limit
56s).
......

The output of* **lctl dl *command from MDS

lctl dl
  0 UP mgs MGS MGS 25
  1 UP mgc MGC192.168.1.78 at tcp 0681a267-849f-350c-5b2c-6869c794550f 5
  2 UP mdt MDS MDS_uuid 3
  3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4
  4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 15
  5 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5
  6 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5
  7 IN osc lustre-OST0006-osc lustre-mdtlov_UUID 5
  8 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5
  9 UP osc lustre-OST0005-osc lustre-mdtlov_UUID 5

I did activated OST6 ( lctl --device 7 activate ) but it couldn't help



Could anyone tell me how to route MDS to connect to our backup OST ( with ip
address 192.168.1.67 , for example ) ? , to bring our OST up ?

Any help would be really appreciated !

Hope that i can receive your answers or suggestions as soon as possible

Best Regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20091118/5b0a96ce/attachment.htm>


More information about the lustre-discuss mailing list