[Lustre-discuss] OSS Service Thread Count

Sun Jan 25 21:01:21 PST 2009

Hello!

On Jan 25, 2009, at 6:56 PM, Wojciech Turek wrote:

> For my particular case it gives 512 ost_num_threads which is the  
> Lustre
> max number for this particular parameter. Manual says that each thread
> uses actually 1.5MB of RAM, so 768MB of RAM will be consumed on each  
> of
> my OSSs for I/O threads.
> So I guess with 16GB of RAM the initial (default) value of the
> ost_num_threads is already being set to 512, is that correct?
> I know that adding more OSSs and OSTs might help but at the moment  
> this
> isn't an option for me.
> Is there any other way I could lower down high load on the OSSs? Can
> tuning client side help?

To decrease the load you actually want to decrease the number of OST  
threads
(ost_num_threads module parameter to ost.ko module).
Essentially what is happening is your drives are only able to sustain  
certain
amount of parallel i/o activity before degrading the performance due  
to all the
seeking going on. Ideally you need to set the number of ost threads to  
this
number, but this is complicated by the fact that different workloads  
(as in
i/o sizes) result in different parallel streams the drives can handle.
Anyway after you reach that point of congestion the performance only  
goes
downhill, the threads just wait for i/o and contribute to your LA  
figures.
You need to experiment a bit to see what number of threads makes sense  
for you.
Perhaps start with number of threads equal to number of actual disk  
spindles
you have on that node (if you use raid5+, subtract any dead spindles  
not used
for actual data (e.g. 1/3 of spindles for raid5)) and and watch the  
performance
of the clients during usual workloads (not LA on OSSes, it won't go  
much higher
than the max_threads you'd specify), if you feel the performance  
degraded,
try increasing thread count somewhat and see how that works until  
performance
starts degrading again or until you reach satisfactory performance.

If your disk configuration does not have writeback cache enabled and  
your
activity is mostly writes, you might also want to give patch from bug  
16919
a try, it removes synchronous journal commit requirements and  
therefore should
somewhat speedup OST writes in this case (unless you already use fast  
external
journal, or unless there is a write cache enabled that somewhat  
mitigates the
synchronousness of journal commit right now).

Hope that helps.

Bye,
     Oleg