[Lustre-discuss] xdd versus sgp_dd

Sun May 4 13:40:38 PDT 2008

On Sat, 2008-05-03 at 11:54 -0700, Peter Bojanic wrote:
> I've seen a couple of references to ORNL using xdd versus sgp_dd for  
> low-level disk performance benchmarking. Could you please summarize  
> the differences and advise if our engineer team as well as Lustre  
> partners should be considering this alternative?

We originally started using xdd for testing as it had features that made
it easy to synchronize runs involving multiple hosts -- this is
important for the testing we've doing against LSI's XBB-2 system and
DDN's 9900. For example, the 9900 was able to hit ~1550 MB/s to 1600
MB/s against a single IB port, but each singlet topped out at ~2650  to
2700 MB/s or so when hit by two hosts. To get realistic aggregate
numbers for both systems, requires that we hit them with four IO hosts
or OSSes.

When run in direct IO (-dio) mode against the SCSI disk device on recent
kernels, xdd takes a very similar path to Lustre's use case -- building
up bio's and using submit_bio() directly, without going through the page
cache and triggering the read-ahead code and associated problems. In
this mode, xdd gave us an aggregate bandwidth of ~5500 MB/s, which
matched up nicely against the ~5000 MB/s we obtained with an IOR run
against a Lustre filesystem on the same hardware. We saw the expected
10% hit from the filesystem vs raw disk.

In contrast, sgp_dd gave us ~1100 MB/s from a single port, which would
indicate a maximum 4400 MB/s from the array assuming perfect scaling.
That would mean we got a result on the filesystem of 113.6% of raw
performance, which doesn't sit well.

That said, there are a few caveats to using xdd -- the largest being
that it does not give perfectly sequential requests when run with a
queue depth greater than 1. It uses multiple threads when it wants to
have more than 1 request in flight, and that leads to the requests being
generally ascending, but not perfectly sequential. This can cause
performance regressions when the array does not internally reorder
requests.

It is only possible to run xdd in direct IO mode against block devices
in recent kernels -- 2.6.23 I believe is the cutoff. In kernels older
than that, it must go through the page cache, and that may cause lower
performance to be measured.

Aborted shutdowns of xdd will often leave SysV semaphores orphaned,
which will require manual cleanup when you hit the system limit.

It looks like it should be possible to run xdd in a manner suitable for
sgpdd-survey so that we could run tests against multiple regions of the
disk at the same time. I've not spent any time looking closely at that
option.

I'm not sure why sgd_dd was getting lower numbers on the 2.6.24 kernel I
was testing against -- there may be a performance regression with the
SCSI generic devices.

Hope this helps, feel free to ask further questions.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office