[lustre-discuss] Lustre OSS and clients on same physical server

Fri Jul 15 15:19:48 PDT 2016

Cory,

For what it¹s worth, the existing tests and framework run in the
single-node configuration without any special steps (or at least did
within the last year or so).  You just build lustre, run llmount to get
servers up and client mounted, and then run tests/sanity.sh.

You then get varying results each time you do this.  Some tests are
themselves flawed (ie racy), other tests are themselves fine but fail
intermittently because of some more general problem like memory management
issues.  The issues that arise typically aren¹t easy to diagnose in my
experience.

The problem is resources - using resources to investigate this behavior
instead of better testing in the typical multi-node configuration, or
implementing new features, or doing code cleanup, etc.  In other words,
sadly, the usual problem.

-Olaf

On 7/15/16, 1:38 PM, "lustre-discuss on behalf of Cory Spitz"
<lustre-discuss-bounces at lists.lustre.org on behalf of spitzcor at cray.com>
wrote:

>Good input, Chris.  Thanks.
>
>It sounds like we might need to move this over to lustre-devel.
>
>Someday, I¹d like to see us address some of these things and then add
>some test framework tests that co-locate clients with servers.  Not
>necessarily because we expect co-located services, but because it could
>be a useful driver of keeping Lustre a good memory manager.
>
>-Cory
>
>-- 
>
>
>On 7/15/16, 3:17 PM, "Christopher J. Morrone" <morrone2 at llnl.gov> wrote:
>
>On 07/15/2016 12:11 PM, Cory Spitz wrote:
>> Chris,
>> 
>> On 7/13/16, 2:00 PM, "lustre-discuss on behalf of Christopher J.
>>Morrone" <lustre-discuss-bounces at lists.lustre.org on behalf of
>>morrone2 at llnl.gov> wrote:
>> 
>>> If you put both the client and server code on the same node and do any
>>> serious amount of IO, it has been pretty easy in the past to get that
>>> node to go completely out to lunch thrashing on memory issues
>> 
>> Chris, you wrote ³in the past.²  How current is your experience?  I¹m
>>sure it is still a good word of caution, but I¹d venture that modern
>>Lustre (on a modern kernel) might fare a tad bit better.  Does anyone
>>have experience on current releases?
>
>Pretty recent.
>
>We have had memory management issues with servers and clients
>independently at pretty much all periods of time, recent history
>included.  Putting the components together only exacerbates the issues.
>
>Lustre still has too many of its own caches with fixed, or nearly fixed
>caches size, and places where it does not play well with the kernel
>memory reclaim mechanisms.  There are too many places where lustre
>ignores the kernels requests for memory reclaim, and often goes on to
>use even more memory.  That significantly impedes the kernel's ability
>to keep things responsive when memory contention arises.
>
>> I understand that it isn¹t a design goal for us, but perhaps we should
>>pay some attention to this possibility?  Perhaps we¹ll have interest in
>>co-locating clients on servers in the near future as part of a
>>replication, network striping, or archiving capability?
>
>There is going to need to be a lot of work to have Lustre's memory usage
>be more dynamic, more aware of changing conditions on the system, and
>more responsive to the kernel's requests to free memory.  I imagine it
>won't be terribly easy, especially in areas such as dirty and unstable
>data which cannot be freed until it is safe on disk.  But even for that,
>there are no doubt ways to make things better.
>
>Chris
>
>
>
>_______________________________________________
>lustre-discuss mailing list
>lustre-discuss at lists.lustre.org
>http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org