[lustre-discuss] Using Lustre for distributed build use case -- Performance tuning suggestions

Tue May 29 15:51:32 PDT 2018

Unless you have huge directories, you may not see any improvement from DNE, and it may hurt performance because striped directories have more overhead when they are first created.

DNE is mostly useful when a single MDS is overloaded by many clients, but with the small IO workload here that may not be the case.

Also, you would likely benefit from IB networking, which is lower latency compared to TCP.

Cheers, Andreas

On May 29, 2018, at 17:26, meng ding <dingmeng at gmail.com<mailto:dingmeng at gmail.com>> wrote:

Hi,

We are in the process of helping a client to evaluate share file system solutions for their distributed build use case. Our client is a company with hundreds of developers around the globe doing system software development 24 hours. At anytime, there could be many builds running, including interactive builds (mostly incremental build), or batch builds (mostly regression tests).

The size of the build is very big. For example, a single full build may have more than 6000 build tasks (i.e., make rules) that can all be run in parallel. Each build task takes about 6 seconds on average to run. So a sequential build using 1 CPU (e.g., make -j 1)  will take 10 hours to complete.

Our client is using a distributed build software to run the build across a cluster of build hosts. Think of (make -j N), only the build tasks are running simultaneously on multiple hosts, instead of one. Obviously for this to work, they need to have a shared file system with good performance. Our client is currently using NFS on NetApp, which most of the time provides good performance, but at a very high cost. With this combination, our client is able to complete the above mentioned build in less than 5 minutes in a build cluster with about 30 build hosts (25 cores per host). Another advantage of using a cluster of build hosts is to accommodate many builds from multiple developers at the same time, with each developer dynamically assigned a fair share of the total cores in the build cluster at any given time based on the resource requirement of each build.

The distributed build use case has the following characteristics:

  *   Mostly very small source files (tens of thousands of them) less than 16K to start with.
  *   Source files are read-only. All reads are sequential.
  *   Source files are read repetitively (e.g., the header files). So it can benefit hugely from client-side caching.
  *   Intermediate object files, libraries, or binary files are small to medium in size, the biggest binary generated is about several hundred megabytes.
  *   Binary/object files are generated by small random writes.
  *   There is NO concurrent/shared access to the same file. Each build task generates its own output file.

With this use case in mind, we are trying to explore alternative solutions to NFS on NetApp with the goal to achieve comparable performance with reduced cost. So far, we have done some benchmark with Lustre on distributed build of GCC 8.1 on AWS, but the performance is lagging quite a bit behind even kernel NFS:

Lustre Setup

Lustre Server

  *   2 MDS each has m5.2xlarge instance (8 vCPUS, 32GiB Mem, up to 10Gb network), backed by 80 GiB SSD formated with LDISKFS.
  *   DNE phase II (striped directory) is enabled.
  *   No data striping is enabled because most files are small.
  *   4 OSS each has m5.xlarge instance (4 vCPUS, 16GiB Mem, up to 10Gb network) , backed by 40 GiB SSD formated with LDISKFS.
Build cluster
30 build hosts m5.xlarge, 120 CPUs in total all mounting the same Lustre volume

  *   The following is configured on all build hosts:
mount -t lustre -o localflock …
lctl set_param osc./*.checksums=0
lctl set_param osc./*.max_rpcs_in_flight=32
lctl set_param osc./*.max_dirty_mb=128

Test and results
Running distributed build of GCC 8.1 in the Lustre mount across the build cluster:

Launching 1 build only:

  *   Takes on average 17 minutes 45 seconds to finish.

Launching 20 builds at the same time all sharing the same build cluster:

  *   Takes on average 46 minutes to finish for each build.

By the way, we have tried the Data-on-MDT feature since we are using Lustre 2.11, but we did not observe performance improvement.

Kernel NFS Setup

NFS Server
1 NFS server m5.2xlarge (8 vCPUS, 32GiB Mem, up to 10Gb network), backed by 300 GiB SSD formatted with XFS

Build cluster
30 build hosts m5.xlarge, 120 CPUs in total all mounting the same NFS volume using NFS v3 protocol.

Test and results
Running distributed build of GCC 8.1 in the NFS mount across the build cluster:
Launching 1 build only:

  *   Takes on average 16 minutes 36 seconds to finish. About 1 minute faster than Lustre.

Launching 20 builds at the same time all sharing the same build cluster:

  *   Takes on average 38 minutes to finish for each build. About 8 minutes faster than Lustre.

So our question to the Lustre experts, given the distributed build use, case do you suggest anything else that we can try to potentially improve the performance further?

Thanks,
ading
_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180529/0dffb1ea/attachment.html>