[lustre-discuss] poor performance on reading small files

Dilger, Andreas andreas.dilger at intel.com
Wed Aug 3 20:40:55 PDT 2016

On Aug 3, 2016, at 19:28, Riccardo Veraldi <Riccardo.Veraldi at cnaf.infn.it> wrote:
> On 03/08/16 10:57, Dilger, Andreas wrote:
>> On Jul 29, 2016, at 03:33, Oliver Mangold <Oliver.Mangold at EMEA.NEC.COM> wrote:
>>> On 29.07.2016 04:19, Riccardo Veraldi wrote:
>>>> I am using lustre on ZFS.
>>>> While write performances are excellent also on smaller files, I find
>>>> there is a drop down in performance
>>>> on reading 20KB files. Performance can go as low as 200MB/sec or even
>>>> less.
>>> Getting 200 MB/s with 20kB files means you have to do 10000 metadata
>>> ops/s. Don't want to say it is impossible to get more than that, but at
>>> least with MDT on ZFS this doesn't sound bad either. Did you run an
>>> mdtest on your system? Maybe some serious tuning of MD performance is in
>>> order.
>> I'd agree with Oliver that getting 200MB/s with 20KB files is not too bad.
>> Are you using HDDs or SSDs for the MDT and OST devices?  If using HDDs,
>> are you using SSD L2ARC to allow the metadata and file data be cached in
>> L2ARC, and allowing enough time for L2ARC to be warmed up?
>> Are you using TCP or IB networking?  If using TCP then there is a lower
>> limit on the number of RPCs that can be handled compared to IB.
> Yes Andreas perhaps is not too bad and in my particular situation I am reading bunch of 20KB chunks inside a bigger 200GB file.
> I found benefits reducing the ZFS record size that was set up at the beginning to a quite large value.

For large streaming writes recordsize=1024k will give the best performance,
but this is not very good for small random IO.  It is not currently possible
to explicitly change the blocksize on a per-file basis.  However, there is
some interest to be able to change this in the future with the "lfs ladvise"
tunable.  See https://jira.hpdd.intel.com/browse/LU-7225 if you are interested
to contribute to this work.

> I am using SSD disks and I did not set up a L2ARC because I do not think I'd have much benefit in my siutation.  So it is not a Lustre problem at all.
> thank you I did not know about LU-4865

The LU-4865 patch is mostly useful for starting with smaller blocksize for
small files, or files written with random IO patterns.  It will not
necessarily help if the file is written sequentially and then read in a
random pattern.

Cheers, Andreas

More information about the lustre-discuss mailing list