[Lustre-discuss] tuning for small I/O
Sheila Barthel
Sheila.Barthel at Sun.COM
Fri Jan 8 07:31:27 PST 2010
Also, the Lustre manual includes a section on improving performance when
working with small files:
http://manual.lustre.org/manual/LustreManual18_HTML/LustreTroubleshootingTips.html#50532481_pgfId-1291398
Sheila
Atul Vidwansa wrote:
> Hi Jay,
>
> There are multiple ways to tune Lustre for small IO. If you seach
> lustre-discuss archives, you will find many threads on same topic. I
> have some suggestions below:
>
> Jay Christopherson wrote:
>
>> I'm attempting to run a pair of ActiveMQ java instances, using a
>> shared Lustre filesystem mounted with flock for failover purposes.
>> There's lots of ways to do ActiveMQ failover and shared filesystem
>> just happens to be the easiest.
>>
>> ActiveMQ, at least the way we are using it, does a lot of small I/O's,
>> like 600 - 800 IOPS worth of 6K I/O's. When I attempt to use Lustre
>> as the shared filesystem, I see major IO wait time on the cpu's, like
>> 40 - 50%. My OSS's and MDS don't seem to be particularly busy being
>> 90% idle or more while this is running. If I remove Lustre from the
>> equation and simply write to local disk OR to an iSCSI mounted SAN
>> disk, my ActiveMQ instances don't seem to have any problems.
>>
>> The disk that is backing the OSS's are all SAS 15K disks in a RAID1
>> config. The OSS's (2 of them) each have 8GB of memory and 4 cpu cores
>> and are doing nothing else except being OSS's. The MDS has one cpu
>> and 4G of memory and is 98% idle while under this ActiveMQ load. The
>> network I am using for Lustre is dedicated gigabit ethernet and there
>> are 8 clients, two of which are these ActiveMQ clients.
>>
> First of all, I would suggest benchmarking your Lustre setup for small
> file workload. For example, use Bonnie++ in IOPS mode to create small
> sized files on Lustre. That will tell you limit of Lustre setup. I got
> about 6000 creates/sec on my 12 disk (Seagate SAS 15K RPM 300 GB) RAID10
> setup.
>
>
>> So, my question is:
>>
>> 1. What should I be looking at to tune my Lustre FS for this type of
>> IO? I've tried upping the lru_size of the MDT and OST namespaces in
>> /proc/fs/lustre/ldlm to 5000 and 2000 respectively, but I don't really
>> see much difference. I have also ensured that striping is disabled
>> (lfs setstripe -d) on the shared directory.
>>
> Try disabling Lustre debug messages on all clients:
>
> sysctl -w lnet.debug=0
>
> Try increasing dirty cache on client nodes:
>
> lctl set_param osc.*.max_dirty_mb=256
>
> Also, you can bump up max rpcs in flight from 8 to 32 but given that you
> have gigabit ethernet network, I don't think it will improve performance.
>
> Cheers,
> -Atul
>
>> I guess I am just not experienced enough yet with Lustre to know how
>> to track down and resolve this issue. I would think Lustre should be
>> able to handle this load, but I must be missing something. For the
>> record, NFS was not able to handle this load either, at least with
>> default export settings (async was improved, but async is not an option).
>>
>> - Jay
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
More information about the lustre-discuss
mailing list