[Lustre-discuss] xdd versus sgp_dd

Andreas Dilger adilger at sun.com
Mon May 5 12:42:43 PDT 2008


On May 04, 2008  21:45 -0300, Peter Bojanic wrote:
> Dave, thanks for the great response --this could easily be elavorated  
> as a short LCE whitepaper, btw.
> 
> I look forward to hearing from Andreas, Alex and other Lustre  
> engineers on this.

I haven't personally been using sgp_dd or xdd very much, but the
requirement for kernels >= 2.6.23 pretty much rules this out for
use at most of our customers, since the latest vendor kernel (RHEL5)
is based on 2.6.18.

As for the issue of mutli-threaded processes not having perfectly
sequential IO, that is fine also, because the way we use sgp_dd
already has similar issues, and this is also true of Lustre OSTs
as well.

> On 4-May-08, at 17:40, David Dillow <dillowda at ornl.gov> wrote:
> 
> >
> > On Sat, 2008-05-03 at 11:54 -0700, Peter Bojanic wrote:
> >> I've seen a couple of references to ORNL using xdd versus sgp_dd for
> >> low-level disk performance benchmarking. Could you please summarize
> >> the differences and advise if our engineer team as well as Lustre
> >> partners should be considering this alternative?
> >
> > We originally started using xdd for testing as it had features that  
> > made
> > it easy to synchronize runs involving multiple hosts -- this is
> > important for the testing we've doing against LSI's XBB-2 system and
> > DDN's 9900. For example, the 9900 was able to hit ~1550 MB/s to 1600
> > MB/s against a single IB port, but each singlet topped out at ~2650   
> > to
> > 2700 MB/s or so when hit by two hosts. To get realistic aggregate
> > numbers for both systems, requires that we hit them with four IO hosts
> > or OSSes.
> >
> > When run in direct IO (-dio) mode against the SCSI disk device on  
> > recent
> > kernels, xdd takes a very similar path to Lustre's use case --  
> > building
> > up bio's and using submit_bio() directly, without going through the  
> > page
> > cache and triggering the read-ahead code and associated problems. In
> > this mode, xdd gave us an aggregate bandwidth of ~5500 MB/s, which
> > matched up nicely against the ~5000 MB/s we obtained with an IOR run
> > against a Lustre filesystem on the same hardware. We saw the expected
> > 10% hit from the filesystem vs raw disk.
> >
> > In contrast, sgp_dd gave us ~1100 MB/s from a single port, which would
> > indicate a maximum 4400 MB/s from the array assuming perfect scaling.
> > That would mean we got a result on the filesystem of 113.6% of raw
> > performance, which doesn't sit well.
> >
> > That said, there are a few caveats to using xdd -- the largest being
> > that it does not give perfectly sequential requests when run with a
> > queue depth greater than 1. It uses multiple threads when it wants to
> > have more than 1 request in flight, and that leads to the requests  
> > being
> > generally ascending, but not perfectly sequential. This can cause
> > performance regressions when the array does not internally reorder
> > requests.
> >
> > It is only possible to run xdd in direct IO mode against block devices
> > in recent kernels -- 2.6.23 I believe is the cutoff. In kernels older
> > than that, it must go through the page cache, and that may cause lower
> > performance to be measured.
> >
> > Aborted shutdowns of xdd will often leave SysV semaphores orphaned,
> > which will require manual cleanup when you hit the system limit.
> >
> > It looks like it should be possible to run xdd in a manner suitable  
> > for
> > sgpdd-survey so that we could run tests against multiple regions of  
> > the
> > disk at the same time. I've not spent any time looking closely at that
> > option.
> >
> > I'm not sure why sgd_dd was getting lower numbers on the 2.6.24  
> > kernel I
> > was testing against -- there may be a performance regression with the
> > SCSI generic devices.
> >
> > Hope this helps, feel free to ask further questions.
> > -- 
> > Dave Dillow
> > National Center for Computational Science
> > Oak Ridge National Laboratory
> > (865) 241-6602 office
> >
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list