[Lustre-discuss] tuning for small I/O

Fri Jan 8 07:31:27 PST 2010

Also, the Lustre manual includes a section on improving performance when 
working with small files:

http://manual.lustre.org/manual/LustreManual18_HTML/LustreTroubleshootingTips.html#50532481_pgfId-1291398

Sheila

Atul Vidwansa wrote:
> Hi Jay,
>
> There are multiple ways to tune Lustre for small IO. If you seach 
> lustre-discuss archives, you will find many threads on same topic. I 
> have some suggestions below:
>
> Jay Christopherson wrote:
>   
>> I'm attempting to run a pair of ActiveMQ java instances, using a 
>> shared Lustre filesystem mounted with flock for failover purposes.  
>> There's lots of ways to do ActiveMQ failover and shared filesystem 
>> just happens to be the easiest.
>>
>> ActiveMQ, at least the way we are using it, does a lot of small I/O's, 
>> like 600 - 800 IOPS worth of 6K I/O's.  When I attempt to use Lustre 
>> as the shared filesystem, I see major IO wait time on the cpu's, like 
>> 40 - 50%.  My OSS's and MDS don't seem to be particularly busy being 
>> 90% idle or more while this is running.  If I remove Lustre from the 
>> equation and simply write to local disk OR to an iSCSI mounted SAN 
>> disk, my ActiveMQ instances don't seem to have any problems.
>>
>> The disk that is backing the OSS's are all SAS 15K disks in a RAID1 
>> config.  The OSS's (2 of them) each have 8GB of memory and 4 cpu cores 
>> and are doing nothing else except being OSS's.  The MDS has one cpu 
>> and 4G of memory and is 98% idle while under this ActiveMQ load.  The 
>> network I am using for Lustre is dedicated gigabit ethernet and there 
>> are 8 clients, two of which are these ActiveMQ clients.
>>     
> First of all, I would suggest benchmarking your Lustre setup for small 
> file workload. For example, use Bonnie++ in IOPS mode to create small 
> sized files on Lustre. That will tell you limit of Lustre setup. I got 
> about 6000 creates/sec on my 12 disk (Seagate SAS 15K RPM 300 GB) RAID10 
> setup.
>
>   
>> So, my question is:
>>
>> 1.  What should I be looking at to tune my Lustre FS for this type of 
>> IO?  I've tried upping the lru_size of the MDT and OST namespaces in 
>> /proc/fs/lustre/ldlm to 5000 and 2000 respectively, but I don't really 
>> see much difference.  I have also ensured that striping is disabled 
>> (lfs setstripe -d) on the shared directory.
>>     
> Try disabling Lustre debug messages on all clients:
>
> sysctl -w lnet.debug=0
>
> Try increasing dirty cache on client nodes:
>
> lctl set_param osc.*.max_dirty_mb=256
>
> Also, you can bump up max rpcs in flight from 8 to 32 but given that you 
> have gigabit ethernet network, I don't think it will improve performance.
>
> Cheers,
> -Atul
>   
>> I guess I am just not experienced enough yet with Lustre to know how 
>> to track down and resolve this issue.  I would think Lustre should be 
>> able to handle this load, but I must be missing something.  For the 
>> record, NFS was not able to handle this load either, at least with 
>> default export settings (async was improved, but async is not an option).
>>
>> - Jay
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>   
>>     
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>