[lustre-discuss] Lustre OSS and clients on same physical server

Fri Jul 15 13:38:34 PDT 2016

Good input, Chris.  Thanks.

It sounds like we might need to move this over to lustre-devel.

Someday, I’d like to see us address some of these things and then add some test framework tests that co-locate clients with servers.  Not necessarily because we expect co-located services, but because it could be a useful driver of keeping Lustre a good memory manager.

-Cory

-- 

On 7/15/16, 3:17 PM, "Christopher J. Morrone" <morrone2 at llnl.gov> wrote:

On 07/15/2016 12:11 PM, Cory Spitz wrote:
> Chris,
> 
> On 7/13/16, 2:00 PM, "lustre-discuss on behalf of Christopher J. Morrone" <lustre-discuss-bounces at lists.lustre.org on behalf of morrone2 at llnl.gov> wrote:
> 
>> If you put both the client and server code on the same node and do any
>> serious amount of IO, it has been pretty easy in the past to get that
>> node to go completely out to lunch thrashing on memory issues
> 
> Chris, you wrote “in the past.”  How current is your experience?  I’m sure it is still a good word of caution, but I’d venture that modern Lustre (on a modern kernel) might fare a tad bit better.  Does anyone have experience on current releases?

Pretty recent.

We have had memory management issues with servers and clients
independently at pretty much all periods of time, recent history
included.  Putting the components together only exacerbates the issues.

Lustre still has too many of its own caches with fixed, or nearly fixed
caches size, and places where it does not play well with the kernel
memory reclaim mechanisms.  There are too many places where lustre
ignores the kernels requests for memory reclaim, and often goes on to
use even more memory.  That significantly impedes the kernel's ability
to keep things responsive when memory contention arises.

> I understand that it isn’t a design goal for us, but perhaps we should pay some attention to this possibility?  Perhaps we’ll have interest in co-locating clients on servers in the near future as part of a replication, network striping, or archiving capability?

There is going to need to be a lot of work to have Lustre's memory usage
be more dynamic, more aware of changing conditions on the system, and
more responsive to the kernel's requests to free memory.  I imagine it
won't be terribly easy, especially in areas such as dirty and unstable
data which cannot be freed until it is safe on disk.  But even for that,
there are no doubt ways to make things better.

Chris