[lustre-discuss] poor performance on reading small files

Dilger, Andreas andreas.dilger at intel.com
Wed Aug 3 17:01:02 PDT 2016


On Aug 3, 2016, at 12:32, Jeff Johnson <jeff.johnson at aeoncomputing.com> wrote:
> 
> On 8/3/16 10:57 AM, Dilger, Andreas wrote:
>> On Jul 29, 2016, at 03:33, Oliver Mangold <Oliver.Mangold at EMEA.NEC.COM> wrote:
>>> On 29.07.2016 04:19, Riccardo Veraldi wrote:
>>>> I am using lustre on ZFS.
>>>> 
>>>> While write performances are excellent also on smaller files, I find
>>>> there is a drop down in performance
>>>> on reading 20KB files. Performance can go as low as 200MB/sec or even
>>>> less.
>>> Getting 200 MB/s with 20kB files means you have to do 10000 metadata
>>> ops/s. Don't want to say it is impossible to get more than that, but at
>>> least with MDT on ZFS this doesn't sound bad either. Did you run an
>>> mdtest on your system? Maybe some serious tuning of MD performance is in
>>> order.
>> I'd agree with Oliver that getting 200MB/s with 20KB files is not too bad.
>> Are you using HDDs or SSDs for the MDT and OST devices?  If using HDDs,
>> are you using SSD L2ARC to allow the metadata and file data be cached in
>> L2ARC, and allowing enough time for L2ARC to be warmed up?
>> 
>> Are you using TCP or IB networking?  If using TCP then there is a lower
>> limit on the number of RPCs that can be handled compared to IB.
> 
> Also consider that 20KB of data per lnet RPC, assuming a 1MB RPC, to move 20KB files at 200MB/sec into a non-striped LFS directory you are using EDR for lnet? 100GB Ethernet?

It should be clarified that even if the maximum RPC size is 1MB, Lustre will
not send more data than actually contained in the file (subject to the page
size granularity of 4KB).  However, one caveat below for ZFS...

One potential issue if using ZFS with recordsize=1024k is used on the OSTs
then without patch http://review.whamcloud.com/18441 "LU-4865 zfs: grow
block size by write pattern" the blocksize will always be 1MB on the OSTs.
If you are storing a large number of small files then this is probably not
the most efficient use of space, and it will inflate the amount of data sent
over the network as well.  Better to either apply that patch locally (and
provide feedback on how it is working), or select a recordsize that better
matches your file size (e.g. 64KB or 128KB).

Cheers, Andreas



More information about the lustre-discuss mailing list