[Lustre-discuss] un-even distribution of data over OSTs

Grigory Shamov gas5x at yahoo.com
Wed Mar 7 07:41:28 PST 2012


Dear Lustre-Users,

Recently we had an issue with file data distribution over our Lustre OSTs. We have a Lustre storage cluster here, of two OSS servers in active-active failover mode. The version of luster is 1.8, possibly with DDN patches. 

The cluster has 12 OSTs, 7.3Tb each. Normally, they are occupied to about 60% of the space (4.5Tb or so); but recently, one of them got completely filled (99%) with two other also keeping up (80%). The rest of OSTs stayed at the usual 60%. 

Why would that happen, shouldn't' Lustre try to distribute the space evenly? I have checked the filled OSTs for large files; there were no files that can be called large enough to explain the difference (with size of the order of magnitude of the difference between 99% and 60% occupation, i.e. 2-3Tb); some users did have large directories, but the files were of about 5-10Gb size.

I have checked our Lustre parameters, the qos_prio_free seems to be default 90%, qos_threshold_rr is 16%, and stripe count is 1. 

Could you please suggest what might have caused such behavior of Lustre, are there any tunables/better values of tresholds, etc. to change to avoid such imbalances, etc.? 

Thank you very much in advance!

--
Grigory Shamov
HPC Analyst,
University of Manitoba
Winnipeg MB Canada




More information about the lustre-discuss mailing list