[Lustre-discuss] The client profile could not be read from the MGS

Wojciech Turek wjt27 at cam.ac.uk
Tue Jan 5 09:57:33 PST 2010


Hello everyone and Happy New Year,

On my MDS server I have two file systems work and work2. Yesterday I
reconfigured file system named 'work' and ran writeconf in order to
recreate it's configuration logs. I ran writeconf while other file
system work2 was running. Both file systems share the same MGS and I
think that writeconf cleared CONFIGS directory on the MGS for both of
them. I didn't see any problems immediately after I run writeconf
until I unmounted work2 from one of the client servers. When I tried
to mount it back this message appeared:

mount.lustre: mount 10.44.245.203 at tcp:/work2 at /scratch2 failed:
Invalid argument
This may have multiple causes.
Is 'work2' the correct filesystem name?
Are the mount options correct?
Check the syslog for more info.

And the syslog on the clients says:
Jan  5 17:15:47 node-h01 kernel: LustreError: 156-2: The client
profile 'work2-client' could not be read from the MGS.  Does that
filesystem exist?
Jan  5 17:15:47 node-h01 kernel: LustreError:
7936:0:(ldlm_request.c:996:ldlm_cli_cancel_req()) Got rc -108 from
cancel RPC: canceling anyway
Jan  5 17:15:47 node-h01 kernel: LustreError:
7936:0:(ldlm_request.c:1605:ldlm_cli_cancel_list())
ldlm_cli_cancel_list: -108
Jan  5 17:15:47 node-h01 kernel: Lustre: client ffff81016d4dd000 umount complete
Jan  5 17:15:47 node-h01 kernel: LustreError:
7936:0:(obd_mount.c:1980:lustre_fill_super()) Unable to mount  (-22)

I have done some searching and I found one similar problem reported on
this mailing list.
the suggestion was to check the CONFIGS dir if the client profile file exists.

On my MDS node I ran this command:
debugfs -c -R 'ls -l CONFIGS' /dev/drbd_mds03_vg/mgs_lv
debugfs 1.40.7.sun3 (28-Feb-2008)
/dev/drbd_mds03_vg/mgs_lv: catastrophic mode - not reading inode or
group bitmaps
 303105   40777 (2)      0      0    4096  4-Jan-2010 11:39 .
      2   40755 (2)      0      0    4096 22-May-2009 10:59 ..
 303106  100644 (1)      0      0   12288 22-May-2009 10:59 mountdata
 303107  100644 (1)      0      0   28704  4-Jan-2010 05:15 work-client
 303108  100644 (1)      0      0   27936  4-Jan-2010 05:15 work-MDT0000
 303109  100644 (1)      0      0    8880  4-Jan-2010 05:16 work-OST0000
 303110  100644 (1)      0      0    8880  4-Jan-2010 05:16 work-OST0001
 303111  100644 (1)      0      0    8880  4-Jan-2010 05:17 work-OST0002
 303112  100644 (1)      0      0    8880  4-Jan-2010 05:17 work-OST0003
 303113  100644 (1)      0      0    8880  4-Jan-2010 05:18 work-OST0004
 303114  100644 (1)      0      0    8880  4-Jan-2010 05:21 work-OST0005
 303115  100644 (1)      0      0    8880  4-Jan-2010 05:21 work-OST0006
 303116  100644 (1)      0      0    8880  4-Jan-2010 05:21 work-OST0007
 303117  100644 (1)      0      0    8880  4-Jan-2010 05:22 work-OST0008
 303118  100644 (1)      0      0    8880  4-Jan-2010 05:23 work-OST0009
 303119  100644 (1)      0      0    8880  4-Jan-2010 05:23 work-OST000a
 303120  100644 (1)      0      0    8880  4-Jan-2010 05:23 work-OST000b
 303121  100644 (1)      0      0       0  4-Jan-2010 11:39 work2-client

work2-client file is zero size and all the OST and MDT files for work2
file system are missing.

Is there a way to recover this files without stopping work2 file system?

If I umount all work2 OSTs and MDT and then run writeconf on them and
mount them back, would this recreate this missing files?

Also can do above without umounting clients (let them wait until
lustre targets come back) and would this kill any jobs running one
them?

Many thanks for your input

Cheers

Wojciech


-- 
--
Wojciech Turek

Assistant System Manager

High Performance Computing Service
University of Cambridge
Email: wjt27 at cam.ac.uk
Tel: (+)44 1223 763517



More information about the lustre-discuss mailing list