[Lustre-devel] lustre-1.8.8: rdma_listen() backlog 0 breaks iWARP

Steve Wise swise at opengridcomputing.com
Thu Jul 24 07:22:17 PDT 2014


> >Hello,
> >
> >I'm trying to get lustre-1.8.8/RHEL6 running over Chelsio iWARP RNICs and
> >connection setup
> >is failing at the server due to kiblnd_startup() calling rdma_listen()
> >with a backlog of
> >0.  This effectively rejects all incoming connection requests.   I looked
> >at lustre-1.8.7,
> >and the backlog was 256 in that release.
> >
> >Q:  Why was it changed to 0?
> 
> Since I'm not familiar with the LNET code myself, I'd recommend to check
> the
> commit messages in Git to see if there is an explanation, or in the linked
> Jira/Bugzilla ticket.
> 
> You may also want to see if this is fixed with the 1.8.9 release.
> 

+ sean hefty
+ Isaac Huang

This commit changed the backlog to 0:

commit 7b442f1a43714455fad06c527b6fbc10f82af857
Author: Isaac Huang <he.h.huang at oracle.com>
Date:   Wed Nov 17 07:14:46 2010 -0700

    b=20153 add IB bonding failover support to o2iblnd

    O2iblnd changes to support failover events from an IB
    bonding IPoIB interface. Mostly to recreate device
    specific resources, e.g. listener CMID.

    i=isaac
    i=liang

Bug: https://projectlava.xyratex.com/show_bug.cgi?id=20153

I'm not sure why it was changed to 0 though.  It definitely breaks iwarp support.  I'm not
yet sure what the semantics are for creating a listening cm_id with a backlog of 0.  Was
the assumption that 0 means "let the system choose" or "max supported backlog"?  The iwarp
CM interprets 0 to mean no connection requests allowed. :)  

Isaac, can you explain?

Thanks,

Steve.





More information about the lustre-devel mailing list