[Lustre-discuss] Optimize parallel commpilation on lustre

Mon Jun 28 09:04:03 PDT 2010

Heya,
> If you are interested to do a tiny bit of hacking, it would be interesting to do an experiment to see what kind of performance can be gotten in your benchmark by a single client.  Currently, Lustre limits each client to a single filesystem-modifying metadata operation at one time, in order to prevent the clients from overwhelming the server, and to ensure that the clients can recover the filesystem correctly in case of a server crash.
I just tested this. Before, I tried to do an out-of-tree build. My for clients
are using nfsroot, so I put the kernel source on it, then I mount lustre on
/mnt/lustre, and I compile on /mnt/lustrE/build (with make O=). The results
(without) your patch are interesting :
7m42 against 9m7 before with -j 4
4 min 51 against 5 min 34 with -j 8
3 min 27 againt 4 min 19 with -j 16
I also use -pipe as gcc option, to avoid temp files.

So, my first question is : could it be possible in some way to disable cache
coherency on some subdirectory ? If I know all the files in this directory will
be acceded in read only, I do not need coherency. It would permit to read the
files from lustre instead of nfs.

I then tried with your patch, not much difference :

4 min 43 againt 4 min 51 without it (-j 8)
7min 40 against 7 min 42 with -j 8
So it changes almost nothing :)

> I'm not sure if it makes a difference in your case or not, but increasing the MDC RPCs in flight might also help performance.  Also, increasing the client cache size and the number of IO RPCs may also help.  On the clients run:
> 
> lctl set_param *.*.max_rpcs_in_flight=64
> lctl set_param osc.*.max_dirty_mb=512
no change 

> You may also test running the make directly on the MDS with a local Lustre mount to determine if the network latency is a significant factor in the performance. If you are using Ethernet instead of IB the latency could be hurting you, since kernel compiles are generally only doing a tiny amount of work per file and then you need to send a few RPCs to open and read the next file and the headers.  Some of this can be hidden by pre-reading all of the files into the client caches (new machines should have enough RAM, about 1GB or so), but the "open" operations still need to send an RPC to the MDS for each file open, so running on the MDS (or with a low-latency network like IB) may help compiles like this run more quickly.
we don't have IB set up atm, so I can not test with it. I will try directly on
the mds (so on only one node) to compare.

Regards,

Maxence
-- 
Maxence DUNNEWIND
Contact : maxence at dunnewind.net
Site : http://www.dunnewind.net
06 32 39 39 93
GPG : 18AE 61E4 D0B0 1C7C AAC9  E40D 4D39 68DB 0D2E B533
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100628/31319832/attachment.pgp>