[Lustre-discuss] OSTs cant' connect to MDS

Dam Thanh Tung tungdt at isds.vn
Sat Sep 26 18:18:32 PDT 2009


Hi everyone !

i currently have a serious trouble with OST - MDS connecting. My lustre file
system have 1 MDS  and 3 OSTs ( each MDS and OST have backup node by
synchronize by drdb ).
Yesterday, maybe because my partner move CATALOGS file when mount our
devices at ldiskfs type, everything goes down, all of our OSTs can't connect
to my MDS. I tried umounting all and remount but it didn't help. Everything
is ok when i mount my disk on MDS and OST, but after recovering, in MDS log,
we saw error like this:

Sep 26 05:46:51 MDS1 kernel: LustreError: 6161:0:(mds_lov.c:984:__mds_
lov_synchronize()) lustre-OST0003_UUID failed at update_mds: -22

and MDS deactivate our OST, all of our OSTs are in the INACTIVE state with
MDS :

lctl dl
  0 UP mgs MGS MGS 15
  1 UP mgc MGC192.168.1.78 at tcp dd7b40bd-ab09-d972-7e3a-fc62205b4968 5
  2 UP mdt MDS MDS_uuid 3
  3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4
  4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 7
  5 IN osc lustre-OST0003-osc lustre-mdtlov_UUID 5
  6 IN osc lustre-OST0000-osc lustre-mdtlov_UUID 5
  7 IN osc lustre-OST0006-osc lustre-mdtlov_UUID 5
  8 IN osc lustre-OST0005-osc lustre-mdtlov_UUID 5
  9 IN osc lustre-OST0004-osc lustre-mdtlov_UUID 5



Because of rc: -22 report, i tried changing parameters in our OSTs ( in
face, i only erase and set it with the old parameter, because it's work well
with this info during 4 months, i don't think we had a problem in using
parameter here ) but i didn't help and show me an other error:

When i mount one of my OST ( both OST and MDS are justified parameter by
tunefs.lustre ), i get this:

mount.lustre: mount /dev/sdc at /mnt/lustre failed: Input/output error
Is the MGS running?

OST and MDS completely can connect together, by both ping and lctc ping
!!!!!

I also mounted my mdt as ldiskfs type and remove CATALOGS and CONFIGS,
didn't help :(
As trying in vain, i reformat OST and MDS like this:

mkfs.lustre --reformat --verbose --writeconf --ost
--mgsnode=192.168.1.78 at tcp:192.168.1.80 at tcp
--failover=192.168.1.82 at tcp--index=1 /dev/sdc

mkfs.lustre --reformat --mgs --mdt --failover=192.168.1.80 at tcp --writeconf
/dev/sda4

After reformat, everything is at the stand still, i still get : Is the MGS
running error :(


With all of our problems i showed you above, could you please give me and
advice or solution ? it's really really a disaster with me now ?

Is there any way to fix the failed at update_mds -22 error ?
Is there any way to fix the " is MGS running error ? "

I still have all of my data in MGS - backup node ( it have the same problem
with MDS1 but didn't be formatted ), could anyone please show me how to move
it safely to my new MDS ?


Any help could be highly appreciated :(

Hope you can reply us as soon as possible . Many thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090927/03526df0/attachment.htm>


More information about the lustre-discuss mailing list