[Lustre-devel] faking LNET scale

Nicholas Henke nic at cray.com
Tue Jun 2 10:12:40 PDT 2009

Liang Zhen wrote:
> Nic,
> It's very late night for me now, my head is not clear enough for me to 
> make sure whether I'm saying something crazy, :)
> LNet always thinks target is remote network(needs router) if it can't 
> find a NI with same network ID, for example, if local NI is (ptl0) and 
> caller wants to send message to (ptl1), then LNet will:
> 1. Try to find local  NI for ptl1, and failed then:
> 2. try to find if ptl1 is a remote network and whether there is router 
> for this network (ptl1)
> So if you want your server has only one NI instance and can talk with a 
> set of different networks, and at the same time, it can talk with other 
> remote networks via routers,  I would suggest:
> 1. create a new command, for example: lctl add_local_net ptl0 ptl[1-N], 
> which means LNet should allow NI(ptl0) accessing networks( ptl[1-N] as 
> local networks.
> 2. add a new structure in LNet, i.e:
> struct  {
>      struct list_head ln_list;
>    __u32                ln_net;
>     lnet_ni_t          *ln_localni;
>     ......
> }lnet_localnet_t;
> As you see, it's very like current structure lnet_remotenet_t, which is 
> pending on lnet_t::ln_remote_nets; we can create a lnet_locallnet_t 
> object and add it to global list (i.e: lnet_t::ln_local_nets) by the 
> command we mentioned above: lctl add_local_net
> 3. once upper layer caller sending message, lnet_send() should check 
> lnet_t::ln_local_nets firstly (before thinking it's a remote network and 
> checking on lnet_t::ln_remote_nets), if it is on 
> lnet_t::ln_local_netsthen we can take the local NI. on 
> lnet_locanet_t::ln_localni;
> 4. We need add a new flag for LND, only LND with the flag can support 
> command lctl add_local_net.
> 5. make the LND wouldn't reject messages from different networks.
> again, hope I'm answering what you are asking, :)

This is almost working - I'm running into one problem: lnet_accept wants to 
match the ni->ni_nid against the requested NID. It is failing as the nets don't 
match (ptl1 vs ptl0).

It looks like there are a fair number of places like this, most using 

How should I handle those? Add another clause like ptlcompat (like ln_aliases) 
and if that is set (we have aliases set), do a search to find the alias and see 
if there is an alias that would allow NIDNET(lnet_net) == NIDNET(ptl_net)?

Is there a cleaner way?


More information about the lustre-devel mailing list