[Lustre-discuss] buffering

Andreas Dilger andreas.dilger at oracle.com
Thu Aug 12 14:53:59 PDT 2010

On 2010-08-12, at 15:08, Mark Nelson wrote:
> How does the kernel and storage on the OSSes aggregate writes when the number of service threads are increased?

The OSS layer does not aggregate writes itself.  This is done on the client before the writes RPCs are generated, or in the block device (elevator and/or cache for h/w RAID devices) at the bottom end.

There is a research project called "Network Request Scheduler" that aims to submit the IOs in a more coherent order at the OSS thread level, to facilitate block device merging, but it will not explicitly merge the IOs itself.

> The Lustre tuning section on the wiki mentions that there are "internal I/O buffers".  How are aggregating those writes different than the way the dirty cache on the clients work?
> http://wiki.lustre.org/index.php/Lustre_Tuning

In 1.6- there was an explicit 1MB pre-allocated receive buffer for every thread, used to stage a single IO RPC from network RDMA and submit to the block layer.  In 1.8+ this 1MB of memory is dynamically allocated from the page cache, at least for the duration of the IO submission, and then depending on /proc tunables (read_cache_enable,  writethrough_cache_enable, readcache_max_filesize) it will either discard the page immediately, or keep it in memory and let the VM evict it when there is memory pressure (if not accessed).

> On 08/12/2010 12:35 PM, Andreas Dilger wrote:
>> On 2010-08-11, at 23:36, burlen wrote:
>>> I am interested in how write()s are buffered in Lustre on the cleint,
>>> server, and network in between. Specifically I'd like to understand what
>>> happens during writes when large number of clients are making large
>>> writes to all of the OSTs on an OSS, and the buffers are inadequate to
>>> handle the outgoing/incoming data.
>> Lustre doesn't buffer dirty pages on the OSS, only on the client.  The clients are granted a "reserve" of space in each OST filesystem to ensure there is enough free space for any cached writes that they do.
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Lustre Technical Lead
>> Oracle Corporation Canada Inc.
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> -- 
> Mark Nelson, Lead Software Developer
> Minnesota Supercomputing Institute
> Phone: (612)626-4479
> Email: mark at msi.umn.edu

Cheers, Andreas
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

More information about the lustre-discuss mailing list