[lustre-discuss] lustre-discuss Digest, Vol 112, Issue 30

Wed Jul 29 09:03:05 PDT 2015

Hi Massimo,

     This sounds exactly like the issue I encountered over a month ago with my lustre 2.5.3 system.  The quick solution I found was to set the qos_threshold_rr to 100% (so flat round robin, not weighted).  However that causes a problem where osts would go over 90% while others were still under 50%.  I was able to come up with a hack... I created a pool that included all the osts except the ones that were not usable, and then put every directory in that pool (called production).  Once that was done I was able to turn back on the qos round robin.

A problem with this is that, in 2.5.3, pools are not properly inherited (https://jira.hpdd.intel.com/browse/LU-5916).  That means that new directories wouldn't get the pool information, and would thus only land on osts above the bad ones.  This was solved using the changelog, which shows when directories are created.  We were then able to write some code that assigned every new directory to the production pool.  So far it seems to be working.

Another issue I've since discovered is that since files created before the production pool was created don't have a pool then using lfs_migrate (which uses the files striping, not the directory striping) caused files to be written to the osts above the bad osts.

w/r, 
Kurt

Message: 3
Date: Wed, 29 Jul 2015 17:31:25 +0200
From: Massimo Sgaravatto <massimo.sgaravatto at pd.infn.it>
To: lustre-discuss at lists.lustre.org
Subject: [lustre-discuss] Lustre doesn't use new OST
Message-ID: <55B8F1CD.5080509 at pd.infn.it>
Content-Type: text/plain; charset="utf-8"; Format="flowed"

Hi

We had a Lustre filesystem composed of 5 OSTs.
Because of a problem with 3 OSTs (the problem is described in the thread 
"Problems moving an OSS from an old Lustre installation to a new one"), 
we disabled them.

Now we want to reformat (mkfs.lustre --reformat ...) these 3 OSS and 
make them on-line.

For the time being we performed this operation just for one OSS (using a 
new index number).

The current scenario is the following (OST0005 is the reformatted OST):

lfs df -h /lustre/cmswork/
UUID                       bytes        Used   Available Use% Mounted on
cmswork-MDT0000_UUID      374.9G        3.5G      346.4G   1% 
/lustre/cmswork[MDT:0]
cmswork-OST0000_UUID       18.1T       14.5T        2.7T  84% 
/lustre/cmswork[OST:0]
cmswork-OST0001_UUID       18.1T       14.2T        3.0T  83% 
/lustre/cmswork[OST:1]
OST0002             : inactive device
OST0003             : inactive device
OST0004             : inactive device
cmswork-OST0005_UUID       13.6T      415.1M       12.9T   0% 
/lustre/cmswork[OST:5]

filesystem summary:        49.7T       28.7T       18.5T  61% 
/lustre/cmswork

The problem is that  the "Lustre scheduler" is not selecting OST0005 at 
all for new files.

Only if I use "lfs setstripe --index 5 " I see that the relevant files 
are written to this OST. Otherwise only OST0000 and OST0001 are used

We didn't change the values for qos_threshold_rr and qos_prio_free, 
which are therefore using the default values (17%, 91 %).

I can't find anything useful in the log files.
Any idea ?

Thanks, Massimo