Andreas,<br><br>In theory, this should work well. Each index is about 15gb, and usually the same index is used sequentially at a given time. These are genome alignments, so we will, for instance align all of a human experiment before we switch indexes to align all of a rat genome. As such 32gb should be plenty of RAM to hold 1 full genome and all of the other small and medium files that may simultaneously go through lustre. I suppose the main complication is if a user reads a large file (that is not too large) in the middle of an alignment, as this could very well happen.<br>

<br>By default, will Lustre 1.8 use all available memory to cache, or is there a tunable parameter to set the total readcache_max_size? If someone has free cycles, a cool feature for future releases would be a mechanism to set cache affinities. For instance, to allow me to set indexes to take precedence over a 15gb scratch file.<br>

<br>Thanks so much,<br>Jordan<br><br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

On Apr 02, 2009  15:17 -0700, Jordan Mendler wrote:<br>

> I deployed Lustre on some legacy hardware and as a result my (4) OSS's each<br>

> have 32GB of RAM. Our workflow is such that we are frequently rereading the<br>

> same 15GB indexes over and over again from Lustre (they are striped across<br>

> all OSS's) by all nodes on our cluster. As such, is there any way to<br>

> increase the amount of memory that either Lustre or the Linux kernel uses to<br>

> cache files read from disk by the OSS's? This would allow much of the<br>

> indexes to be served from memory on the OSS's rather than disk.<br>

<br>

With Lustre 1.8.0 (in late release testing, you could grab<br>

v1_8_0_RC5 from CVS for testing[*]) there is OSS server-side caching<br>

of read and just-written data.  There is a tunable that allows<br>

limiting the maximum file size that is cached on the OSS so that<br>

small files can be cached, and large files will not wipe out the<br>

read cache, /proc/fs/lustre/obdfilter/*/readcache_max_filesize.<br>

<br>

Set readcache_max_filesize just large enough to hold your index files<br>

(which are hopefully not too large individually) to maximize your<br>

cache retention.  While the cache eviction is LRU, it may be that<br>

at high IO rates your working set would still be evicted from RAM<br>

if too many other files fall within the cache file size limit.<br>

<br>

[*] Note that v1_8_0_RC5 is missing the fix for bug 18659 so is not at<br>

all safe to use on the MDS, v1_8_0_RC6 will have that fix, as does b1_8.<br>

<br>

> I see a *lustre.memused_max = 48140176* parameter, but not sure what that<br>

> does. If it matters, my setup is such that each of the 4 OSS's serves 1 OST<br>

> that consists of a software RAID10 across 4 SATA disks internal to that OSS.<br>

<br>

That is just reporting the total amount of RAM ever used by the<br>

Lustre code itself (48MB in this case), and has nothing to do with<br>

the cached data.<br>

<br>

Cheers, Andreas<br>

--<br>

Andreas Dilger<br>

Sr. Staff Engineer, Lustre Group<br>

Sun Microsystems of Canada, Inc.<br></blockquote></div><br>