[Lustre-discuss] Large scale obdfilter-survey issues

Tue Oct 14 13:12:21 PDT 2008

On Fri, 2008-10-03 at 15:45 -0400, David Dillow wrote:
> Has anyone run obdfilter-survey with large object/thread counts and can
> share their experiences? I having some odd issues and am trying to
> determine if it is a setup issue on my side, or something I should file
> a bug on with Sun.

I filed bug 17382 for this, as well as a quick and dirty patch that
fixes the issue.

Currently 'lctl test_brw' exits as soon as any one of the threads exits.
When you are only running 16GB through 512 threads with 1 MB request
sizes, each thread will only do 32 requests, and so it is possible that
one thread gets slightly preferential treatment and finish well before
the rest of the pack.

Options include increasing the amount of data written by each thread,
which mitigates the issue but does not solve it. It also slows down the
test, as testing seems to indicate a need for at least 256
requests/thread for more realistic numbers in my environment, or 131 GB
per OST. More would be even better, but 256 means a potential runtime of
over 30 minutes per variable change, making large surveys painful.

My fix just waits for all threads to exit rather than stopping the test
early when the first thread completes. It is still a good idea to raise
the number of requests per thread to keep the workload up, but at these
scales the throughput drops to almost pure random behavior against the
noop scheduler, so we can limit the increase in test duration.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office