[Lustre-discuss] SSD caching of MDT

Robin Humble robin.humble+lustre at anu.edu.au
Thu Aug 19 12:33:35 PDT 2010


On Thu, Aug 19, 2010 at 01:29:37PM +0100, Gregory Matthews wrote:
>Article by Jeff Layton:
>
>http://www.linux-mag.com/id/7839
>
>anyone have views on whether this sort of caching would be useful for 
>the MDT? My feeling is that MDT reads are probably pretty random but 
>writes might benefit...?

if you look at the tiny size of inodes in slabtop on an MDS you'll
see that all read ops for most fs's are probably 100% cached in ram
by a decent sized MDS. ie. once you have traversed all inodes of a fs
once, then likely the MDT's are a write-only media, and the ram of the
MDS is a faster iop machine than any SSD could ever be.

you are then left with a MDT workload of entirely small writes. that is
definitely not a SSD sweet spot - many SSDs will fragment badly and
slow down horrendously, which eg. JBODs of 15k rpm SAS disks will not do.
basically beware of cheap SSDs, possibly any SSD, and certainly any SSD
that isn't an Intel x25-e or better. the Marvell controller SSDs we
sadly have many of now, I would not inflict upon any MDT.

also, having experimented with ramdisk MDT's (not in production
obviously), it is clear that even this 'perfect' media doesn't solve
all Lustre iops problems. far from it. usually it just means that you
hit algorithmic or numa problems in Lustre MDS code, or (more likely)
the ops just flow onto the OSTs and those become the bottleneck instead.
basically ramdisk MDT speedups weren't big over even just say, 16 fast
FC or SAS disks. SSDs would be in-between if they were behaving
perfectly, which would require extensive testing to determine.

looking at it a different way, Lustre's statahead kinda works ok,
create's are (IIRC) batched so also scale ok, so delete's might be
the only workload left where the fastest MDT money can buy would get
you any significant benefit... probably not worth the spend for most
folks.

assuming for a moment that SSDs worked as they should, then other
Lustre related workloads for which SSDs might be suitable are external
journals for OSTs, md bitmaps, or (one day) perhaps ZFS intent logs.

cheers,
robin
--
Dr Robin Humble, HPC Systems Analyst, NCI National Facility



More information about the lustre-discuss mailing list