[Lustre-devel] Queries regarding LDLM_ENQUEUE

Paul Nowoczynski pauln at psc.edu
Wed Oct 20 09:43:46 PDT 2010


bzzz.tomas at gmail.com wrote:
> On 10/20/10 6:51 PM, Paul Nowoczynski wrote:
>   
>> Eric makes a good point in that only parallel jobs really need this
>> feature. Unfortunately, at scale the system (both clients and servers)
>> *really do* need something like this, especially if we continue pushing
>> users to perform N-1 file I/O instead of 'file per process'. I too am in
>> agreement that some sort of capability mechanism is the best approach. I
>> wonder if this is something that could be done outside of POSIX and
>> supported through a parallel I/O library? Perhaps a single application
>> threads could make a special open call (/proc magic perhaps?) and obtain
>> the glob of opaque bytes which are then broadcast to the rest of the
>> client via mpi. Traversing the namespace would be avoided on all but one
>> client. In such a scenario I don't feel that enforcing unix permissions
>> at every level of the path is needed or sensible, the operation should
>> be treated as a simple logical open. The question to the lustre experts
>> - can enough state be packed into an opaque object such that the
>> recv'ing client can construct the necessary cache state?
>>     
>
> could you explain why is it so important to skip intermediate lookups?
> those are to be done once, then the clients will do them locally.
> is it because your nodes are getting new paths all the time or the nodes
> are rebooted very often and lose cache?
>   
It's for scalability reasons.  When N clients traverse the namespace 
with the purpose of opening the same file the result is a storm of RPC 
requests which bear down on the metadata server.  This type of activity 
becomes prohibitive especially when you start considering client counts 
 > 10^4.  An operation such as this is ripe for optimization because 
every client in the network is trying to build the same state.  If you 
have a method for a single client to 'learn' the final state, i.e. the 
pathname -> fid translation,  and broadcast it to its cohorts, it's a 
huge win because it eliminates an O(N) operation.
paul
> thanks, z
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>   




More information about the lustre-devel mailing list