[Lustre-discuss] un-even distribution of data over OSTs
Mohr Jr, Richard Frank (Rick Mohr)
rmohr at utk.edu
Fri Mar 9 05:23:31 PST 2012
One possibility could be that a user had run "lfs setstripe -i" and forced a specific starting ost index. If that was done on a directory, I believe all files created in that directory would inherit the value. This could lead to a case where no individual file was large, but all those files ended up on the same ost and filled up the space.
HPC System Administrator
National Institute for Computational Sciences
On Mar 7, 2012, at 10:42 AM, "Grigory Shamov" <gas5x at yahoo.com> wrote:
> Dear Lustre-Users,
> Recently we had an issue with file data distribution over our Lustre OSTs. We have a Lustre storage cluster here, of two OSS servers in active-active failover mode. The version of luster is 1.8, possibly with DDN patches.
> The cluster has 12 OSTs, 7.3Tb each. Normally, they are occupied to about 60% of the space (4.5Tb or so); but recently, one of them got completely filled (99%) with two other also keeping up (80%). The rest of OSTs stayed at the usual 60%.
> Why would that happen, shouldn't' Lustre try to distribute the space evenly? I have checked the filled OSTs for large files; there were no files that can be called large enough to explain the difference (with size of the order of magnitude of the difference between 99% and 60% occupation, i.e. 2-3Tb); some users did have large directories, but the files were of about 5-10Gb size.
> I have checked our Lustre parameters, the qos_prio_free seems to be default 90%, qos_threshold_rr is 16%, and stripe count is 1.
> Could you please suggest what might have caused such behavior of Lustre, are there any tunables/better values of tresholds, etc. to change to avoid such imbalances, etc.?
> Thank you very much in advance!
> Grigory Shamov
> HPC Analyst,
> University of Manitoba
> Winnipeg MB Canada
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
More information about the lustre-discuss