[Lustre-discuss] Service thread count parameter

Fri Oct 19 08:39:09 PDT 2012

Hi Andreas,

Thanks for your update! Some comments below.

JF

On Mon, Oct 15, 2012 at 7:04 PM, Dilger, Andreas
<andreas.dilger at intel.com>wrote:

> On Oct 15, 2012, at 1:01 PM, Jean-Francois Le Fillatre wrote:
> > Yes this is one strange formula... There are two ways of reading it:
> >
> > - "one thread per 128MB of RAM, times the number of CPUs in the system"
> > On one of our typical OSSes (24 GB, 8 cores), that would give:
> ((24*1024) / 128) * 8 = 1536
> > And that's waaaay out there…
>
> This formula was first created when there was perhaps 2GB of RAM and 2
> cores in the system, intended to get some rough correspondence between
> server size and thread count.  Note that there is also a default upper
> limit of 512 for threads created on the system.  However, on some systems
> in the past with slow/synchronous storage, having 1500-2000 IO threads was
> still improving performance and could be set manually.  That said, it was
> always intended as a reasonable heuristic and local performance
> testing/tuning should pick the optimal number.
>
> > - "as many threads as you can fit (128MB * numbers of CPUs) in the RAM
> of your system"
> > Which would then give: (24*1024) / (128*8) = 24
>
> This isn't actually representing what the formula calculates.
>
> > For a whole system, that's really low. But for one single OST, it almost
> makes sense, in which case you'd want to multiply that by the number of
> OSTs connected to your OSS.
>
> The rule of thumb that I've seen in the past, based on benchmarks at many
> sites is 32 threads/OST, which will keep the low-level elevators busy, but
> not make the queue depth too high.
>
> > The way we did it here is that we identified that the major limiting
> parameter is the software RAID, both in terms of bandwidth performance and
> CPU use. So I did some tests on a spare machine to get some load and perf
> figures for one array, using sgpdd-survey. Then, taking into account the
> number of OST per OSS (4) and the overhead of Lustre, I figured out that
> the best thread count would be around 96 (which is 24*4, spot on).
> >
> > One major limitation in Lustre 1.8.x (I don't know if it has changed in
> 2.x) is that only the global thread count for the OSS can be specified. We
> have cases where all OSS threads are used on a single OST, and that
> completely trashes the bandwidth and latency. We would really need a max
> thread count per OST too, so that no single OST would get hit that way. On
> our systems, I'd put the max OST thread count at 32 (to stay in the
> software RAID performance sweet spot) and the max OSS thread count at 96
> (to limit CPU load).
>
> Right.  This is improved in Lustre 2.3, which binds the threads to
> specific cores.  I believe it is also possible to bind OSTs to specific
> cores as well for PCI/HBA/HCA affinity though I'm not 100% sure if the
> OST/CPU binding was included or not.
>

Even in I could bind both OST and threads to a given CPU, it's only a
topological optimization for bandwidth and latency, but what would prevent
a thread to answer a request for a target that is bound to another CPU? I
mean, this is a very nice feature, and with proper configuration it can
bring some notable improvements in performance, but I fail to see how it
would solve the issue of having all threads on an OSS hammering a single
OST.

I am aware that this is a border case, in general use there the load is
spread over multiple targets and there's no problem. But we've hit it here
a few times, and I know of some other sites where they have had the issue
too. If you combine that with RAID issues (like slow disk / read errors /
disk failure / rebuild or resync), you have a machine that locks up so bad
that a cold reset is the only way to get it back under control.

Worst case? Yes. But because the consequences of such a situation can be so
nasty, I would be very happy to be able to control thread allocation per
OST more finely.

> > Thanks!
> > JF
> >
> >
> >
> > On Mon, Oct 15, 2012 at 2:20 PM, David Noriega <tsk133 at my.utsa.edu>
> wrote:
> > How does one estimate a good number of service threads? I'm not sure I
> > understand the following: 1 thread / 128MB * number of cpus
> >
> > On Wed, Oct 10, 2012 at 9:17 AM, Jean-Francois Le Fillatre
> > <jean-francois.lefillatre at clumeq.ca> wrote:
> > >
> > > Hi David,
> > >
> > > It needs to be specified as a module parameter at boot time, in
> > > /etc/modprobe.conf. Check the Lustre tuning page:
> > > http://wiki.lustre.org/manual/LustreManual18_HTML/LustreTuning.html
> > > http://wiki.lustre.org/manual/LustreManual20_HTML/LustreTuning.html
> > >
> > > Note that once created, the threads won't be destroyed, so if you want
> to
> > > lower your thread count you'll need to reboot your system.
> > >
> > > Thanks,
> > > JF
> > >
> > >
> > > On Tue, Oct 9, 2012 at 6:00 PM, David Noriega <tsk133 at my.utsa.edu>
> wrote:
> > >>
> > >> Is this a parameter, ost.OSS.ost_io.threads_max, when set via lctl
> > >> conf_parm will persist between reboots/remounts?
> > >> _______________________________________________
> > >> Lustre-discuss mailing list
> > >> Lustre-discuss at lists.lustre.org
> > >> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> > >
> > >
> > >
> > >
> > > --
> > > Jean-François Le Fillâtre
> > > Calcul Québec / Université Laval, Québec, Canada
> > > jean-francois.lefillatre at clumeq.ca
> > >
> >
> >
> >
> > --
> > David Noriega
> > CSBC/CBI System Administrator
> > University of Texas at San Antonio
> > One UTSA Circle
> > San Antonio, TX 78249
> > Office: BSE 3.114
> > Phone: 210-458-7100
> > http://www.cbi.utsa.edu
> >
> > Please remember to acknowledge the RCMI grant , wording should be as
> > stated below:This project was supported by a grant from the National
> > Institute on Minority Health and Health Disparities (G12MD007591) from
> > the National Institutes of Health. Also, remember to register all
> > publications with PubMed Central.
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >
> >
> >
> > --
> > Jean-François Le Fillâtre
> > Calcul Québec / Université Laval, Québec, Canada
> > jean-francois.lefillatre at clumeq.ca
> >
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Software Architect
> Intel Corporation
>
>
>
>
>
>
>

-- 
Jean-François Le Fillâtre
Calcul Québec / Université Laval, Québec, Canada
jean-francois.lefillatre at clumeq.ca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20121019/dd50dd5e/attachment.htm>