[Lustre-discuss] xdd versus sgp_dd

Sun May 4 17:45:24 PDT 2008

Dave, thanks for the great response --this could easily be elavorated  
as a short LCE whitepaper, btw.

I look forward to hearing from Andreas, Alex and other Lustre  
engineers on this.

Bojanic

On 4-May-08, at 17:40, David Dillow <dillowda at ornl.gov> wrote:

>
> On Sat, 2008-05-03 at 11:54 -0700, Peter Bojanic wrote:
>> I've seen a couple of references to ORNL using xdd versus sgp_dd for
>> low-level disk performance benchmarking. Could you please summarize
>> the differences and advise if our engineer team as well as Lustre
>> partners should be considering this alternative?
>
> We originally started using xdd for testing as it had features that  
> made
> it easy to synchronize runs involving multiple hosts -- this is
> important for the testing we've doing against LSI's XBB-2 system and
> DDN's 9900. For example, the 9900 was able to hit ~1550 MB/s to 1600
> MB/s against a single IB port, but each singlet topped out at ~2650   
> to
> 2700 MB/s or so when hit by two hosts. To get realistic aggregate
> numbers for both systems, requires that we hit them with four IO hosts
> or OSSes.
>
> When run in direct IO (-dio) mode against the SCSI disk device on  
> recent
> kernels, xdd takes a very similar path to Lustre's use case --  
> building
> up bio's and using submit_bio() directly, without going through the  
> page
> cache and triggering the read-ahead code and associated problems. In
> this mode, xdd gave us an aggregate bandwidth of ~5500 MB/s, which
> matched up nicely against the ~5000 MB/s we obtained with an IOR run
> against a Lustre filesystem on the same hardware. We saw the expected
> 10% hit from the filesystem vs raw disk.
>
> In contrast, sgp_dd gave us ~1100 MB/s from a single port, which would
> indicate a maximum 4400 MB/s from the array assuming perfect scaling.
> That would mean we got a result on the filesystem of 113.6% of raw
> performance, which doesn't sit well.
>
> That said, there are a few caveats to using xdd -- the largest being
> that it does not give perfectly sequential requests when run with a
> queue depth greater than 1. It uses multiple threads when it wants to
> have more than 1 request in flight, and that leads to the requests  
> being
> generally ascending, but not perfectly sequential. This can cause
> performance regressions when the array does not internally reorder
> requests.
>
> It is only possible to run xdd in direct IO mode against block devices
> in recent kernels -- 2.6.23 I believe is the cutoff. In kernels older
> than that, it must go through the page cache, and that may cause lower
> performance to be measured.
>
> Aborted shutdowns of xdd will often leave SysV semaphores orphaned,
> which will require manual cleanup when you hit the system limit.
>
> It looks like it should be possible to run xdd in a manner suitable  
> for
> sgpdd-survey so that we could run tests against multiple regions of  
> the
> disk at the same time. I've not spent any time looking closely at that
> option.
>
> I'm not sure why sgd_dd was getting lower numbers on the 2.6.24  
> kernel I
> was testing against -- there may be a performance regression with the
> SCSI generic devices.
>
> Hope this helps, feel free to ask further questions.
> -- 
> Dave Dillow
> National Center for Computational Science
> Oak Ridge National Laboratory
> (865) 241-6602 office
>