[Lustre-discuss] OSS Cache Size for read optimization (Andreas Dilger)

Mon Apr 6 12:44:35 PDT 2009

Andreas,

In theory, this should work well. Each index is about 15gb, and usually the
same index is used sequentially at a given time. These are genome
alignments, so we will, for instance align all of a human experiment before
we switch indexes to align all of a rat genome. As such 32gb should be
plenty of RAM to hold 1 full genome and all of the other small and medium
files that may simultaneously go through lustre. I suppose the main
complication is if a user reads a large file (that is not too large) in the
middle of an alignment, as this could very well happen.

By default, will Lustre 1.8 use all available memory to cache, or is there a
tunable parameter to set the total readcache_max_size? If someone has free
cycles, a cool feature for future releases would be a mechanism to set cache
affinities. For instance, to allow me to set indexes to take precedence over
a 15gb scratch file.

Thanks so much,
Jordan

On Apr 02, 2009  15:17 -0700, Jordan Mendler wrote:
> > I deployed Lustre on some legacy hardware and as a result my (4) OSS's
> each
> > have 32GB of RAM. Our workflow is such that we are frequently rereading
> the
> > same 15GB indexes over and over again from Lustre (they are striped
> across
> > all OSS's) by all nodes on our cluster. As such, is there any way to
> > increase the amount of memory that either Lustre or the Linux kernel uses
> to
> > cache files read from disk by the OSS's? This would allow much of the
> > indexes to be served from memory on the OSS's rather than disk.
>
> With Lustre 1.8.0 (in late release testing, you could grab
> v1_8_0_RC5 from CVS for testing[*]) there is OSS server-side caching
> of read and just-written data.  There is a tunable that allows
> limiting the maximum file size that is cached on the OSS so that
> small files can be cached, and large files will not wipe out the
> read cache, /proc/fs/lustre/obdfilter/*/readcache_max_filesize.
>
> Set readcache_max_filesize just large enough to hold your index files
> (which are hopefully not too large individually) to maximize your
> cache retention.  While the cache eviction is LRU, it may be that
> at high IO rates your working set would still be evicted from RAM
> if too many other files fall within the cache file size limit.
>
> [*] Note that v1_8_0_RC5 is missing the fix for bug 18659 so is not at
> all safe to use on the MDS, v1_8_0_RC6 will have that fix, as does b1_8.
>
> > I see a *lustre.memused_max = 48140176* parameter, but not sure what that
> > does. If it matters, my setup is such that each of the 4 OSS's serves 1
> OST
> > that consists of a software RAID10 across 4 SATA disks internal to that
> OSS.
>
> That is just reporting the total amount of RAM ever used by the
> Lustre code itself (48MB in this case), and has nothing to do with
> the cached data.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090406/999b9765/attachment.htm>