[lustre-discuss] Using Lustre for distributed build use case -- Performance tuning suggestions

Tue May 29 08:26:40 PDT 2018

Hi,

We are in the process of helping a client to evaluate share file system
solutions for their distributed build use case. Our client is a company
with hundreds of developers around the globe doing system software
development 24 hours. At anytime, there could be many builds running,
including interactive builds (mostly incremental build), or batch builds
(mostly regression tests).

The size of the build is very big. For example, a single full build may
have more than 6000 build tasks (i.e., make rules) that can all be run in
parallel. Each build task takes about 6 seconds on average to run. So a
sequential build using 1 CPU (e.g., make -j 1)  will take 10 hours to
complete.

Our client is using a distributed build software to run the build across a
cluster of build hosts. Think of (make -j N), only the build tasks are
running simultaneously on multiple hosts, instead of one. Obviously for
this to work, they need to have a shared file system with good performance.
Our client is currently using NFS on NetApp, which most of the time
provides good performance, but at a very high cost. With this combination,
our client is able to complete the above mentioned build in less than 5
minutes in a build cluster with about 30 build hosts (25 cores per host).
Another advantage of using a cluster of build hosts is to accommodate many
builds from multiple developers at the same time, with each developer
dynamically assigned a fair share of the total cores in the build cluster
at any given time based on the resource requirement of each build.

The distributed build use case has the following characteristics:

   - Mostly very small source files (tens of thousands of them) less than
   16K to start with.
   - Source files are read-only. All reads are sequential.
   - Source files are read repetitively (e.g., the header files). So it can
   benefit hugely from client-side caching.
   - Intermediate object files, libraries, or binary files are small to
   medium in size, the biggest binary generated is about several hundred
   megabytes.
   - Binary/object files are generated by *small random writes*.
   - There is NO concurrent/shared access to the same file. Each build task
   generates its own output file.

With this use case in mind, we are trying to explore alternative solutions
to NFS on NetApp with the goal to achieve comparable performance with
reduced cost. So far, we have done some benchmark with Lustre on
distributed build of GCC 8.1 on AWS, but the performance is lagging quite a
bit behind even kernel NFS:

*Lustre Setup*

*Lustre Server*

   - 2 MDS each has m5.2xlarge instance (8 vCPUS, 32GiB Mem, up to 10Gb
   network), backed by 80 GiB SSD formated with LDISKFS.
   - DNE phase II (striped directory) is enabled.
   - No data striping is enabled because most files are small.
   - 4 OSS each has m5.xlarge instance (4 vCPUS, 16GiB Mem, up to 10Gb
   network) , backed by 40 GiB SSD formated with LDISKFS.

*Build cluster*

30 build hosts m5.xlarge, 120 CPUs in total all mounting the same Lustre
volume

   - The following is configured on all build hosts:

*mount -t lustre -o localflock …*

*lctl set_param osc./*.checksums=0*

*lctl set_param osc./*.max_rpcs_in_flight=32*
*lctl set_param osc./*.max_dirty_mb=128*

*Test and results*

Running distributed build of GCC 8.1 in the Lustre mount across the build
cluster:

Launching 1 build only:

   - Takes on average *17 minutes 45* seconds to finish.

Launching 20 builds at the same time all sharing the same build cluster:

   - Takes on average *46 minutes* to finish for each build.

By the way, we have tried the Data-on-MDT feature since we are using Lustre
2.11, but we did not observe performance improvement.

*Kernel NFS Setup*

*NFS Server*

1 NFS server m5.2xlarge (8 vCPUS, 32GiB Mem, up to 10Gb network), backed by
300 GiB SSD formatted with XFS

*Build cluster*

30 build hosts m5.xlarge, 120 CPUs in total all mounting the same NFS
volume using NFS v3 protocol.

*Test and results*
Running distributed build of GCC 8.1 in the NFS mount across the build
cluster:

Launching 1 build only:

   - Takes on average *16 minutes 36 seconds* to finish. About 1 minute
   faster than Lustre.

Launching 20 builds at the same time all sharing the same build cluster:

   - Takes on average *38 minutes* to finish for each build. About 8
   minutes faster than Lustre.

So our question to the Lustre experts, given the distributed build use,
case do you suggest anything else that we can try to potentially improve
the performance further?

Thanks,
ading
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180529/adf4d63c/attachment.html>