[Lustre-discuss] high OSS load - readcache_max_filesize

Thu May 5 10:39:25 PDT 2011

Hi all,

a recent posting here (which I can't find atm) has pointed me to 
http://jira.whamcloud.com/browse/LU-15, where an issue is discussed that 
we seem to see as well: some OSS really get overloaded, and the log says

slow journal start 36s due to heavy IO load
slow commitrw commit 36s due to heavy IO load
slow start_page_read 169s due to heavy IO load
slow direct_io 34s due to heavy IO load
...

The bugzilla discussion seems to propose a number of steps to go on each 
OSS as a workaround, among them setting
readcache_max_filesize=32M  or  readcache_max_filesize=0

I have checked the current value of this parameter and found
readcache_max_filesize=18446744073709551615
which translates to 16 EB (if I counted the powers of 1024 correctly).
Am I correct assuming that this is the default value, and that this 
default is meant to read "unlimited"? Or is our OSS configuration just 
badly messed up?

Also, people recommend pinning the bitmaps to memory - how do you do that?

Preallocation tables all seem to contain "256 512 1024", so no shrinking 
of prealloc_table is necessary.
The OSTs in question have just reached the 85% level. We have a number 
of older OSS which are closer to 95% - I guess the problem doesn't show 
up there, because there is no room for further files anyhow...

Regards,
Thomas