[Lustre-discuss] tuning for small I/O
Atul Vidwansa
Atul.Vidwansa at Sun.COM
Fri Jan 8 01:49:34 PST 2010
Hi Jay,
There are multiple ways to tune Lustre for small IO. If you seach
lustre-discuss archives, you will find many threads on same topic. I
have some suggestions below:
Jay Christopherson wrote:
> I'm attempting to run a pair of ActiveMQ java instances, using a
> shared Lustre filesystem mounted with flock for failover purposes.
> There's lots of ways to do ActiveMQ failover and shared filesystem
> just happens to be the easiest.
>
> ActiveMQ, at least the way we are using it, does a lot of small I/O's,
> like 600 - 800 IOPS worth of 6K I/O's. When I attempt to use Lustre
> as the shared filesystem, I see major IO wait time on the cpu's, like
> 40 - 50%. My OSS's and MDS don't seem to be particularly busy being
> 90% idle or more while this is running. If I remove Lustre from the
> equation and simply write to local disk OR to an iSCSI mounted SAN
> disk, my ActiveMQ instances don't seem to have any problems.
>
> The disk that is backing the OSS's are all SAS 15K disks in a RAID1
> config. The OSS's (2 of them) each have 8GB of memory and 4 cpu cores
> and are doing nothing else except being OSS's. The MDS has one cpu
> and 4G of memory and is 98% idle while under this ActiveMQ load. The
> network I am using for Lustre is dedicated gigabit ethernet and there
> are 8 clients, two of which are these ActiveMQ clients.
First of all, I would suggest benchmarking your Lustre setup for small
file workload. For example, use Bonnie++ in IOPS mode to create small
sized files on Lustre. That will tell you limit of Lustre setup. I got
about 6000 creates/sec on my 12 disk (Seagate SAS 15K RPM 300 GB) RAID10
setup.
>
> So, my question is:
>
> 1. What should I be looking at to tune my Lustre FS for this type of
> IO? I've tried upping the lru_size of the MDT and OST namespaces in
> /proc/fs/lustre/ldlm to 5000 and 2000 respectively, but I don't really
> see much difference. I have also ensured that striping is disabled
> (lfs setstripe -d) on the shared directory.
Try disabling Lustre debug messages on all clients:
sysctl -w lnet.debug=0
Try increasing dirty cache on client nodes:
lctl set_param osc.*.max_dirty_mb=256
Also, you can bump up max rpcs in flight from 8 to 32 but given that you
have gigabit ethernet network, I don't think it will improve performance.
Cheers,
-Atul
>
> I guess I am just not experienced enough yet with Lustre to know how
> to track down and resolve this issue. I would think Lustre should be
> able to handle this load, but I must be missing something. For the
> record, NFS was not able to handle this load either, at least with
> default export settings (async was improved, but async is not an option).
>
> - Jay
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
More information about the lustre-discuss
mailing list