[Lustre-discuss] tuning for small I/O

Fri Jan 8 01:49:34 PST 2010

Hi Jay,

There are multiple ways to tune Lustre for small IO. If you seach 
lustre-discuss archives, you will find many threads on same topic. I 
have some suggestions below:

Jay Christopherson wrote:
> I'm attempting to run a pair of ActiveMQ java instances, using a 
> shared Lustre filesystem mounted with flock for failover purposes.  
> There's lots of ways to do ActiveMQ failover and shared filesystem 
> just happens to be the easiest.
>
> ActiveMQ, at least the way we are using it, does a lot of small I/O's, 
> like 600 - 800 IOPS worth of 6K I/O's.  When I attempt to use Lustre 
> as the shared filesystem, I see major IO wait time on the cpu's, like 
> 40 - 50%.  My OSS's and MDS don't seem to be particularly busy being 
> 90% idle or more while this is running.  If I remove Lustre from the 
> equation and simply write to local disk OR to an iSCSI mounted SAN 
> disk, my ActiveMQ instances don't seem to have any problems.
>
> The disk that is backing the OSS's are all SAS 15K disks in a RAID1 
> config.  The OSS's (2 of them) each have 8GB of memory and 4 cpu cores 
> and are doing nothing else except being OSS's.  The MDS has one cpu 
> and 4G of memory and is 98% idle while under this ActiveMQ load.  The 
> network I am using for Lustre is dedicated gigabit ethernet and there 
> are 8 clients, two of which are these ActiveMQ clients.
First of all, I would suggest benchmarking your Lustre setup for small 
file workload. For example, use Bonnie++ in IOPS mode to create small 
sized files on Lustre. That will tell you limit of Lustre setup. I got 
about 6000 creates/sec on my 12 disk (Seagate SAS 15K RPM 300 GB) RAID10 
setup.

>
> So, my question is:
>
> 1.  What should I be looking at to tune my Lustre FS for this type of 
> IO?  I've tried upping the lru_size of the MDT and OST namespaces in 
> /proc/fs/lustre/ldlm to 5000 and 2000 respectively, but I don't really 
> see much difference.  I have also ensured that striping is disabled 
> (lfs setstripe -d) on the shared directory.
Try disabling Lustre debug messages on all clients:

sysctl -w lnet.debug=0

Try increasing dirty cache on client nodes:

lctl set_param osc.*.max_dirty_mb=256

Also, you can bump up max rpcs in flight from 8 to 32 but given that you 
have gigabit ethernet network, I don't think it will improve performance.

Cheers,
-Atul
>
> I guess I am just not experienced enough yet with Lustre to know how 
> to track down and resolve this issue.  I would think Lustre should be 
> able to handle this load, but I must be missing something.  For the 
> record, NFS was not able to handle this load either, at least with 
> default export settings (async was improved, but async is not an option).
>
> - Jay
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>