[Lustre-discuss] ldiskfs performance vs. XFS performance

Wed Oct 20 09:47:02 PDT 2010

> For your final final filesystem you still probably want to enable async
> journals (unless you are willing to enable the S2A unmirrored device cache).

OK, thanks. We'll give this a try.

Michael

> Most obdecho/obdfilter-survey bugs are gone in 1.8.4, except your ctrl+c
> problem, for which a patch exists:
>
> https://bugzilla.lustre.org/show_bug.cgi?id=21745

>
> Cheers,
> Bernd
>
>
> On Wednesday, October 20, 2010, Michael Kluge wrote:
>> Thanks a lot for all the replies. sgpdd shows 700+ MB/s for the device.
>> We trapped into one or two bugs with obdfilter-survey as lctl has at
>> least one bug in 1.8.3 when is uses multiple threads and
>> obdfilter-survey also causes an LBUG when you CTRL+C it. We see 600+
>> MB/s for obdfilter-survey over a reasonable parameter space after we
>> changed to the ext4 based ldiskfs. So that seems to be the trick.
>>
>> Michael
>>
>> Am Montag, den 18.10.2010, 14:04 -0600 schrieb Andreas Dilger:
>>> On 2010-10-18, at 10:40, Johann Lombardi wrote:
>>>> On Mon, Oct 18, 2010 at 01:58:40PM +0200, Michael Kluge wrote:
>>>>> dd if=/dev/zero of=$RAM_DEV bs=1M count=1000
>>>>> mke2fs -O journal_dev -b 4096 $RAM_DEV
>>>>>
>>>>> mkfs.lustre  --device-size=$((7*1024*1024*1024)) --ost --fsname=luram
>>>>> --mgsnode=$MDS_NID --mkfsoptions="-E stride=32,stripe-width=256 -b
>>>>> 4096 -j -J device=$RAM_DEV" /dev/disk/by-path/...
>>>>>
>>>>> mount -t ldiskfs /dev/disk/by-path/... /mnt/ost_1
>>>>
>>>> In fact, Lustre uses additional mount options (see "Persistent mount
>>>> opts" in tunefs.lustre output). If your ldiskfs module is based on
>>>> ext3, you should add the extents and mballoc options which are known
>>>> to improve performance.
>>>
>>> Even then, the IO submission path of ext3 from userspace is not very
>>> good, and such a performance difference is not unexpected.  When
>>> submitting IO from userspace to ext3/ldiskfs it is being done in 4kB
>>> blocks, and each block is allocated separately (regardless of mballoc,
>>> unfortunately).  When Lustre is doing IO from the kernel, the client is
>>> aggregating the IO into 1MB chunks and the entire 1MB write is allocated
>>> in one operation.
>>>
>>> That is why we developed the "delalloc" code for ext4 - so that userspace
>>> could also get better IO performance, and utilize the multi-block
>>> allocation (mballoc) routines that have been in ldiskfs for ages, but
>>> only accessible from the kernel.
>>>
>>> For Lustre performance testing, I would suggest looking at lustre-iokit,
>>> and in particular "sgpdd" to test the underlying block device, and then
>>> obdfilter-survey to test the local Lustre IO submission path.
>>>
>>> Cheers, Andreas
>>> --
>>> Andreas Dilger
>>> Lustre Technical Lead
>>> Oracle Corporation Canada Inc.
>
>

-- 
Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:    (+49) 351 463-37773
e-mail: michael.kluge at tu-dresden.de
WWW:    http://www.tu-dresden.de/zih