[lustre-discuss] poor performance on reading small files

Riccardo Veraldi Riccardo.Veraldi at cnaf.infn.it
Wed Aug 3 18:30:05 PDT 2016


thank you I did not know about LU-4865
On 03/08/16 17:01, Dilger, Andreas wrote:
> On Aug 3, 2016, at 12:32, Jeff Johnson <jeff.johnson at aeoncomputing.com> wrote:
>> On 8/3/16 10:57 AM, Dilger, Andreas wrote:
>>> On Jul 29, 2016, at 03:33, Oliver Mangold <Oliver.Mangold at EMEA.NEC.COM> wrote:
>>>> On 29.07.2016 04:19, Riccardo Veraldi wrote:
>>>>> I am using lustre on ZFS.
>>>>>
>>>>> While write performances are excellent also on smaller files, I find
>>>>> there is a drop down in performance
>>>>> on reading 20KB files. Performance can go as low as 200MB/sec or even
>>>>> less.
>>>> Getting 200 MB/s with 20kB files means you have to do 10000 metadata
>>>> ops/s. Don't want to say it is impossible to get more than that, but at
>>>> least with MDT on ZFS this doesn't sound bad either. Did you run an
>>>> mdtest on your system? Maybe some serious tuning of MD performance is in
>>>> order.
>>> I'd agree with Oliver that getting 200MB/s with 20KB files is not too bad.
>>> Are you using HDDs or SSDs for the MDT and OST devices?  If using HDDs,
>>> are you using SSD L2ARC to allow the metadata and file data be cached in
>>> L2ARC, and allowing enough time for L2ARC to be warmed up?
>>>
>>> Are you using TCP or IB networking?  If using TCP then there is a lower
>>> limit on the number of RPCs that can be handled compared to IB.
>> Also consider that 20KB of data per lnet RPC, assuming a 1MB RPC, to move 20KB files at 200MB/sec into a non-striped LFS directory you are using EDR for lnet? 100GB Ethernet?
> It should be clarified that even if the maximum RPC size is 1MB, Lustre will
> not send more data than actually contained in the file (subject to the page
> size granularity of 4KB).  However, one caveat below for ZFS...
>
> One potential issue if using ZFS with recordsize=1024k is used on the OSTs
> then without patch http://review.whamcloud.com/18441 "LU-4865 zfs: grow
> block size by write pattern" the blocksize will always be 1MB on the OSTs.
> If you are storing a large number of small files then this is probably not
> the most efficient use of space, and it will inflate the amount of data sent
> over the network as well.  Better to either apply that patch locally (and
> provide feedback on how it is working), or select a recordsize that better
> matches your file size (e.g. 64KB or 128KB).
>
> Cheers, Andreas
>
>



More information about the lustre-discuss mailing list