[lustre-discuss] problem getting high performance output to single file

Tue May 19 10:59:04 PDT 2015

> On May 19, 2015, at 1:44 PM, Schneider, David A. <davidsch at slac.stanford.edu> wrote:
> 
> Thanks for the suggestion! When I had each rank run on a separate compute node/host, I saw parallel performance (4 seconds for the 6GB of writing). When I ran the MPI job on one host (the hosts have 12 cores, by default we pack ranks onto as few hosts as possible), things happened serially, each rank finished about 2 seconds after a different rank.

Hmm. That does seem like there is some bottleneck on the client side that is limiting the throughput from a single client.  Here are some things you could look into (although they might require more tinkering than you have permission to do):

1) Based on your output from “lctl list_nids”, it looks like you are running IP-over-IB.  Can you configure the clients to use RDMA?  (They would have nids like x.x.x.x at o2ib.)

2) Do you have the option of trying a newer client version?  Earlier lustre versions used a single-thread ptlrpcd to manage network traffic, but newer versions have a multi-threaded implementation.  You may need to compare compatibility with the Lustre version running on the servers though.

3) Do you gave checksums disabled?  Try running "lctl get_param osc.*.checksums”.  If the values are “1”, then checksums are enabled which can slow down performance.  You could try setting the value to “0” to see if that helps.

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu