[Lustre-discuss] MDT overloaded when writting small files in large number
Jeff Darcy
jeffd at sicortex.com
Mon Dec 8 08:21:43 PST 2008
Daire Byrne wrote:
> In the past when we've had workloads with lots of small files where each client/job has a unique dataset we have used disk image files on top of a Lustre filesystem to store them. This file is then stored on a single OST and so it reduces the overhead of going to the MDS every time - it becomes a file seek operation. We've even used squashfs archives before for write once read often small file workloads which has the added benefit of saving on disk space. However, if the dataset needs write access to many clients simultaneously then this isn't going to work.
>
We've used the same trick for read-only stuff too. In one case, using
NBD to serve images that lived in a Lustre FS yielded a >12x improvement
vs. using Lustre directly for an application that read ~80K (very) small
files. Unfortunately, there's no equivalent solution for writes, even
if the writes are only occasional. If people wanted to brainstorm about
combinations of NFS, NBD, unionfs, and other tricks that can be combined
with Lustre to good effect in many-small-file cases (which aren't as
uncommon in HPC as you'd think), it might be a very worthwhile
exercise. I think many users and vendors hit this sooner or later, and
agreeing on some recipes that could also be use/test cases would be to
everyone's benefit.
More information about the lustre-discuss
mailing list