[Lustre-discuss] ldiskfs performance vs. XFS performance

Wed Oct 20 09:42:23 PDT 2010

For your final final filesystem you still probably want to enable async 
journals (unless you are willing to enable the S2A unmirrored device cache).

Most obdecho/obdfilter-survey bugs are gone in 1.8.4, except your ctrl+c 
problem, for which a patch exists:

https://bugzilla.lustre.org/show_bug.cgi?id=21745

Cheers,
Bernd

On Wednesday, October 20, 2010, Michael Kluge wrote:
> Thanks a lot for all the replies. sgpdd shows 700+ MB/s for the device.
> We trapped into one or two bugs with obdfilter-survey as lctl has at
> least one bug in 1.8.3 when is uses multiple threads and
> obdfilter-survey also causes an LBUG when you CTRL+C it. We see 600+
> MB/s for obdfilter-survey over a reasonable parameter space after we
> changed to the ext4 based ldiskfs. So that seems to be the trick.
> 
> Michael
> 
> Am Montag, den 18.10.2010, 14:04 -0600 schrieb Andreas Dilger:
> > On 2010-10-18, at 10:40, Johann Lombardi wrote:
> > > On Mon, Oct 18, 2010 at 01:58:40PM +0200, Michael Kluge wrote:
> > >> dd if=/dev/zero of=$RAM_DEV bs=1M count=1000
> > >> mke2fs -O journal_dev -b 4096 $RAM_DEV
> > >> 
> > >> mkfs.lustre  --device-size=$((7*1024*1024*1024)) --ost --fsname=luram
> > >> --mgsnode=$MDS_NID --mkfsoptions="-E stride=32,stripe-width=256 -b
> > >> 4096 -j -J device=$RAM_DEV" /dev/disk/by-path/...
> > >> 
> > >> mount -t ldiskfs /dev/disk/by-path/... /mnt/ost_1
> > > 
> > > In fact, Lustre uses additional mount options (see "Persistent mount
> > > opts" in tunefs.lustre output). If your ldiskfs module is based on
> > > ext3, you should add the extents and mballoc options which are known
> > > to improve performance.
> > 
> > Even then, the IO submission path of ext3 from userspace is not very
> > good, and such a performance difference is not unexpected.  When
> > submitting IO from userspace to ext3/ldiskfs it is being done in 4kB
> > blocks, and each block is allocated separately (regardless of mballoc,
> > unfortunately).  When Lustre is doing IO from the kernel, the client is
> > aggregating the IO into 1MB chunks and the entire 1MB write is allocated
> > in one operation.
> > 
> > That is why we developed the "delalloc" code for ext4 - so that userspace
> > could also get better IO performance, and utilize the multi-block
> > allocation (mballoc) routines that have been in ldiskfs for ages, but
> > only accessible from the kernel.
> > 
> > For Lustre performance testing, I would suggest looking at lustre-iokit,
> > and in particular "sgpdd" to test the underlying block device, and then
> > obdfilter-survey to test the local Lustre IO submission path.
> > 
> > Cheers, Andreas
> > --
> > Andreas Dilger
> > Lustre Technical Lead
> > Oracle Corporation Canada Inc.

-- 
Bernd Schubert
DataDirect Networks