[lustre-discuss] Avoiding system cache when using ssd pfl extent

Andreas Dilger adilger at whamcloud.com
Fri May 20 00:53:31 PDT 2022


To elaborate a bit on Patrick's answer, there is no mechanism to do this on the *client*, because the performance difference between client RAM and server storage is still fairly significant, especially if the application is doing sub-page read or write operations.

However, on the *server* the OSS and MDS will *not* put flash storage into the page cache, because using the kernel page cache has a measurable overhead, and (at least in our testing) the performance of NVMe IOPS is actually better *without* the page cache because more CPU is available to handle RPCs.  This is controlled on the server with osd-ldiskfs.*.{read_cache_enable,writethrough_cache_enable}, default to 0 if the block device is non-rotational, default to 1 if block device is rotational.

Separately, there is a tunable for avoiding the page cache for large read/write RPCs, osd-ldiskfs.*.readcache_max_io_mb=8 by default, so RPCs >= 8MB go directly to the disk, to avoid blowing out the page cache on the server.

Cheers, Andreas

> On May 19, 2022, at 12:21, Patrick Farrell via lustre-discuss <lustre-discuss at lists.lustre.org> wrote:
> 
> Well, you could use two file descriptors, one for O_DIRECT one otherwise. 🙂
> 
> SSD is a fast medium but my instinct is the desirability of having data in RAM is much more about I/O pattern and hard to optimize for in advance - Do you read the data you wrote?  (Or read data repeatedly?)
> 
> In any case, there's no mechanism today.  It's also relatively marginal if we're just doing buffered I/O then forcing the data out - it will reduce memory usage but it won't improve performance.
> 
> -Patrick
> 
> From: John Bauer <bauerj at iodoctors.com>
> Sent: Thursday, May 19, 2022 1:16 PM
> To: Patrick Farrell <pfarrell at ddn.com>; lustre-discuss at lists.lustre.org <lustre-discuss at lists.lustre.org>
> Subject: Re: [lustre-discuss] Avoiding system cache when using ssd pfl extent
>  
> Pat,
> No, not in  general.  It just seems that if one is storing data on an SSD it should be optional to have it not stored in memory ( why store in 2 fast mediums ).
> O_DIRECT is not of value as that would apply to all extents, whether on SSD on HDD.   O_DIRECT on Lustre has been problematic for me in the past, performance wise.
> John
> On 5/19/22 13:05, Patrick Farrell wrote:
>> No, and I'm not sure I agree with you at first glance.
>> 
>> Is this just generally an idea that data stored on SSD should not be in RAM?  If so, there's no mechanism for that other than using direct I/O.
>> 
>> -Patrick
>> From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of John Bauer <bauerj at iodoctors.com>
>> Sent: Thursday, May 19, 2022 12:48 PM
>> To: lustre-discuss at lists.lustre.org <lustre-discuss at lists.lustre.org>
>> Subject: [lustre-discuss] Avoiding system cache when using ssd pfl extent
>>  
>> When using PFL, and using an SSD as the first extent, it seems it would 
>> be advantageous to not have that extent's file data consume memory in 
>> the client's system buffers.  It would be similar to using O_DIRECT, but 
>> on a per-extent basis.  Is there a mechanism for that already?
>> 
>> Thanks,
>> 
>> John
>> 
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud









More information about the lustre-discuss mailing list