[Lustre-discuss] Lustre FS Corruption

Andreas Dilger adilger at clusterfs.com
Fri Oct 5 14:56:01 PDT 2007


On Oct 03, 2007  21:54 -0400, Charles Taylor wrote:
> We have a 4-way SMP server (dual opteron 275s) configured as a  
> combined MGS/MDS and OSS server as thus...
> 
> /dev/sda              205G  1.8G  192G   1% /lustre/mri/mdt0
> /dev/f2c0l0/lv-f2c0l0
>                        3.4T  2.3T  1.2T  67% /lustre/mri/ost0
> /dev/f2c0l1/lv-f2c0l1
>                        3.4T  3.0T  460G  87% /lustre/mri/ost1
> /dev/f2c1l0/lv-f2c1l0
>                        3.4T  3.0T  399G  89% /lustre/mri/ost2
> /dev/f2c1l1/lv-f2c1l1
>                        3.4T  3.0T  418G  88% /lustre/mri/ost3
> /dev/f3c0l0/lv-f3c0l0
>                        3.4T  3.0T  430G  88% /lustre/mri/ost4
> /dev/f3c0l1/lv-f3c0l1
>                        3.4T  3.0T  431G  88% /lustre/mri/ost5
> /dev/f3c1l0/lv-f3c1l0
>                        3.4T  3.0T  378G  90% /lustre/mri/ost6
> /dev/f3c1l1/lv-f3c1l1
>                        3.4T  3.0T  417G  88% /lustre/mri/ost7
> 
> 
> Under heavy load our server has gone down several times (we think due  
> to bug 13438).   Although we have successfully run e2fsck locally on  
> the MDS and each OSS  AND run lfsck according to the documentation,  
> we still seem to be missing about 9TB of our storage.  That is to say  
> that "du -s -h *" finds about 14TB but "df -h" says that the file  
> system is practically full.

Presumably this is still true after stopping the clients and servers,
and restarting?  In some cases file space can be used if e.g. you have
open-unlinked files being held by some clients.  Also, files can be
held by the MDS from crashed clients in case they return after recovery,
and that may not be reclaimed in some cases until after an MDS or OST
restart.

The other issues to be aware of are described in the KB articles:
https://bugzilla.lustre.org/show_bug.cgi?id=2381
https://bugzilla.lustre.org/show_bug.cgi?id=2378

However, that still doesn't explain where 7.5TB of space went.

Presumably lfsck didn't report any space leakage?  In case you weren't
aware, lfsck doesn't do any action by default, and you need to ask it
to delete or link orphan objects on the OSTs.


Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.




More information about the lustre-discuss mailing list