[Lustre-discuss] Slow Copy (Small Files) 1 RPC In Flight?

Fri Jun 21 19:47:08 PDT 2013

Hello!

On Jun 21, 2013, at 9:07 PM, Andrew Mast wrote:
> Very clear, thank you for the explanation, I misunderstood readahead. Yes the 1gb and 10gb file transfer tests was on par with NFS.
> 
> Our use case is typically compiling and find/grep through (30gb) amounts of source code so it seems we are stuck with small files.

Generally this sort of workload is pretty bad for network filesystems due to large amounts of synchronous RPC traffic that you cannot easily predict.
You can get certain speedup by doing several copies in parallel (e.g. one copy per top level subtree or whatever) as then you'll at least get concurrent RPCs.

I know some people try to combat this by running a block device on top of network filesystem and then running some sort of a local fs (say, ext4)
on top of that block device (loopback based). That allows readahead to work, caching to work much better and so on. But this is not without limitations too,
only single node could have this filesystem-file mounted at any single time.

IF you do not have any significant writes to this fileset (if any at all) but a lot of consecutive reads/greps…, you might want just store entire workset as a tar file, that you will read and unpack locally on a client (should be pretty fast) to say a ramfs (need tons of RAM of course) and then do the searches. Also not ideal, but at least network filesystem would then be doing what it's best suited for - large transfers.

If you can come up with some other way of storing large number of smaller files in a single large combined file that you will then access with special tools (like, I dunno, fuse-tarfs or whatever - assuming those don't read unneeded data, but just skip over it, or something more specific to your case) - this might be a winner too.

Bye,
    Oleg