[lustre-discuss] lustre OSC and system cache

Mon Dec 12 18:28:32 PST 2016

Andreas

The file system has lru_max_age=9000000.  I have been googling around to 
find out what this controls, but haven't found much.  Is there 
documentation on how the memory management works with Lustre?  I wonder 
what the lru actually means.  How is it that 2 files on the same node 
are not controlled by the same lru mechanism, as SCR300's pages are 
being lru'ed out when they are clearly used more recently than any in 
SCRATCH?

Thanks

John

On 12/12/2016 6:59 PM, Dilger, Andreas wrote:
> On Dec 12, 2016, at 15:50, John Bauer <bauerj at iodoctors.com> wrote:
>> I'm observing some undesirable caching of OSC data in the system buffers.  This is a single node, single process application.  There are 2 files of interest, SCRATCH and SCR300,  both are scratch files with stripeCount=4.  The system has 128GB of memory.  Lustre maxes out at about 59GB of memory used for caching.
>> SCRATCH,  About 22GB is written/read during the first 300 seconds of the run.  No further activity to the file ( but remains open ) until about 18,700 seconds into the run when another 22GB is written/read.  Illustrated in the top frame of the first plot below.  In the bottom frame of the first plot is the amount of system cache used by each of the 4 OSC's associated with the file over the course of the run ( nearly identical, as would be expected ).  Note that each the OSC's retains its 5.5GB of memory even though nothing is happening to the file.
>> SCR300,  A 110GB file, written and repeatedly read between the times of the above SCRATCH file's I/O.
>>
>> What is of interest it that while SCR300 is doing all its I/O, and its associated OSC's are fighting each other for caching memory, the 4 OSC's for the inactive file(SCRATCH) retain their 22GB of memory.  Why are the 4 OSC's for the inactive file exempt from giving up their memory?  It is very reproducible.
> You don't mention what Lustre version you are using, which makes it hard
> to comment specifically.  That said, you could try reducing the lock LRU
> age, which was changed by default in the 2.8 or 2.9 release to 3900s
> (65 minutes) instead of 36000s (10h) via:
>
>          lctl set_param ldlm.namespaces.*.lru_max_age=3900000
>
> (though check what your current setting is, since the units are in
> "jiffies" (HZ) and that may differ depending on kernel compile options).
>
> Cheers, Andreas
>
>> The application is MSC.Nastran, which has the capability to put the data for SCR300 inside of SCRATCH(increasing its size to 132GB).  If run in this mode, the caching behavior is much better behaved and the job runs in 11,500 seconds, versus 19,000.  Illustrated in 3rd plot below.  While this is a solution for this case, it is not a general solution.
>>
>> Thanks
>>
>> John
>> Plots for SCRATCH
>> <bfoimgfaenjmgmii.png>
>>
>>
>> Plots for SCR300
>>
>> <mncccijbfkiekmmn.png>
>>
>>
>> Plots for SCR300 inside of SCRATCH
>>
>> <adnondhpelpohhjf.png>
>> -- 
>> I/O Doctors, LLC
>> 507-766-0378
>>
>> bauerj at iodoctors.com
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

-- 
I/O Doctors, LLC
507-766-0378
bauerj at iodoctors.com