[Lustre-discuss] Bad distribution of files among OSTs

Thomas Roth t.roth at gsi.de
Sat Oct 31 01:01:26 PDT 2009


Thanks, Andreas.
Indeed we are running Lustre 1.6.7.2, on kernel 2.6.22, Debian Etch. But there was no upgrade
involved, we moved from 1.6.7.1 to .2 in July.

The procedure you described has the slight disadvantage of having to take the OSTs in question
offline. It would be nice if Robinhood did the same job on a live system - according to its manual,
it can purge data on a per-OST basis if they become to full. However, I haven't yet found a way to
extract just the info about these OSTs without deleting files.

In fact, I am in the process of collecting this info "manually": I have now quite a number of lists
of user's data from running "lfs find  --obd OST... /lustre/...", I just haven't run these lists
through a "ls -lh" yet. To busy moving the files instead of measuring them ;-)

Regards,
Thomas

Andreas Dilger wrote:
> On 2009-10-30, at 12:07, Thomas Roth wrote:
>> in our 196 OST - Cluster, the previously perfect distribution of files
>> among the OSTs is not working anymore, since ~ 2 weeks.
>> The filling for most OSTs is between 57% and 62%, but some (~10)  have
>> risen up to 94%. I'm trying to fix that by having these OSTs deactivated
>> on the MDT and finding and migrating away data from them, but it seems
>> I'm not fast enough and it's a ongoing problem - I've just deactivated
>> another OST with threatening 67%.
> 
> Is this correlated to some upgrade of Lustre?  What version are you using?
> 
> 
>> Our qos_prio_free is at the default 90%.
>>
>> Our OST's sizes are between 2.3TB and 4.5TB. We use striping level 1, so
>> it would be possible to fill up an OST by just creating a 2TB file.
>> However, I'm not aware of any such gigafiles (using robinhood to get a
>> picture of our file system).
> 
> To fill the smallest OST from 60% to 90% would only need a few file that
> total 0.3 * 2.3TB, or 690GB.  One way to find such files is to mount the
> full OSTs with ldiskfs and do "find /mnt/ost/O/0 -size +100G" to list the
> object IDs that are very large, and then in bug 21244 I've written a small
> program that dumps the MDS inode number from the specified objects.  You
> can then use "debugfs -c -R "ncheck {list of inode numbers} /dev/${mdsdev}"
> on the MDS to find the pathnames of those files.
> 
> Cheers, Andreas
> -- 
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
> 

-- 
--------------------------------------------------------------------
Thomas Roth
Gesellschaft für Schwerionenforschung
Planckstr. 1                -         64291 Darmstadt, Germany
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986

-------------- next part --------------
A non-text attachment was scrubbed...
Name: t_roth.vcf
Type: text/x-vcard
Size: 298 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20091031/3a968e28/attachment.vcf>


More information about the lustre-discuss mailing list