[Lustre-discuss] Huge Sparse files in ROOT partition of MDT

Nirmal Seenu nirmal at fnal.gov
Fri Mar 6 05:26:13 PST 2009


While trying to figure out the reason for LVM2 snapshots failing on our 
MDT server I found that there are a lot of sparse files on the MDT 
volume. The file size as seen from a ls command output on the MDT is 
same as the real file size. The tar runs for a few hours at this point 
even if try to use the --spare option in the tar command.

The total MDT partition usage itself is about 500MB (as reported by df) 
and it used to take me less than 10 minutes to create a LVM2 snapshot 
and tar it up when I was running the servers using Lustre 1.6.5 with no 
quota enabled.

I recently upgraded my Lustre servers to 1.6.7 and tried to enable quota 
on the MDT and OST by doing the following commands:

tunefs.lustre --erase-params --mdt --mgsnode=iblustre1 at tcp1 --param 
lov.stripecount=1 --writeconf --param mdt.quota_type=ug 
/dev/mapper/lustre1_volume-mds_lv

tunefs.lustre --erase-params --ost --mgsnode=iblustre1 at tcp1 --param 
ost.quota_type=ug --writeconf /dev/sdc1

I was never able to run a "lfs quotacheck" successfully due to LBUGS.

I was able to create a LVM2 snapshot and look at the contents of the 
MDT. Every directory but for ROOT seems to have the correct content. 
Some of the files under ROOT still have 0 byte usage while the other 
have 35GB usage (the file itself is sparse as seen from od output):

-rw-r--r-- 1 ***** ***          0 Jan 24 08:36 
l48144f21b747m0036m018-Coul_000505


-rw-rw-r-- 1 ***** *** 36691775424 Feb 22 19:56 
prop_WALL_pbc_m0.033_LS16_t0_002080

At this point I am curious to know if this is the expected behaviour of 
MDT or some corruption on the MDT file system. Do we have to live with 
the fact that the backup process using LVM2 snapshots take a few hours 
to complete if quotas are enabled?

Thanks for your help in advance.
Nirmal



More information about the lustre-discuss mailing list