[lustre-discuss] LNET ports and connections

Steve Crusan stevec at dug.com
Thu Feb 20 18:53:12 PST 2020


Can’t you also use tcpkill to kill the connections? I’ve used it to kill
“stuck” NFS connections before due to MTU related issues, etc.

It’s distributed normally with the dsniff package if you are using an rpm
based distribution.

-Steve

On Thu, Feb 20, 2020 at 20:39 NeilBrown <neilb at suse.com> wrote:

>
> I haven't tried this, but you might be able to close a socket by:
>
> 1/ decrease  /proc/sys/net/ipv4/tcp_keepalive_time
>   so that keep-alives get sent sooner.  Maybe set to 60,
>   set ..._intvl to 5, and _probes to 3.
>
> 2/ create a rule with iptables to drop all messages sent
>   on the particular connection.
>    iptables -A OUTPUT -m multiport --dports ... -sports .. -j DROP
>
>
> Given the suggested keep alive settings, you should only have to wait 75
> seconds after creating the IP tables rule before the connection is
> broken.
>
> NeilBrown
>
>
> On Thu, Feb 20 2020, Degremont, Aurelien wrote:
>
> > Thanks. It feels like the theory is valid.
> > Ideally to confirm I would need a way to manually force close the
> socklnd socket to force the other peer to re-established it.
> > Could not find a way to do it for socket opened by kernel threads.
> >
> > Le 19/02/2020 23:12, « NeilBrown » <neilb at suse.com> a écrit :
> >
> >
> >     When LNet wants to send a message over a SOCKLND interface,
> >     ksocknal_launch_packet() is called.
> >
> >     This calls ksocknal_launch_all_connections_locked()
> >     This loops over all "routes" to the "peer" to make sure they all have
> >     "connections".
> >     If it finds a route without a connection (returned by
> >     ksocknal_find_connectable_route_locked()) it calls
> >     ksocknal_launch_connection_locked() which adds the connection
> request to
> >     ksnd_connd_routes, and wakes up the connd.  The connd thread will
> then
> >     make the connection.
> >
> >     Hope that helps.
> >
> >     NeilBrown
> >
> >
> >
> >     On Wed, Feb 19 2020, Degremont, Aurelien wrote:
> >
> >     > Thanks! That's really interesting.
> >     > Do you have a code pointer that could show where the code will
> establish this connection if missing?
> >     >
> >     > Le 18/02/2020 23:34, « NeilBrown » <neilb at suse.com> a écrit :
> >     >
> >     >
> >     >     It is not true that:
> >     >        LNET will established connections only if asked for by
> upper layers.
> >     >
> >     >     or at least, not in the sense that the upper layers ask for a
> >     >     connection.
> >     >     Lustre knows nothing about connections.  Even LNet doesn't
> really know
> >     >     about connections. It is only at the socklnd level that
> connections mean
> >     >     much.
> >     >
> >     >     Lustre and LNet are message-passing protocols.
> >     >     Lustre asks LNet to send a message to a given peer, and gives
> some
> >     >     details of the sort of reply to expect.
> >     >     LNet chooses a route and thus a network interface, and asked
> the LND to
> >     >     send the message.
> >     >     The socklnd LND will see if it already has a TCP connection.
> If it
> >     >     does, it will use it.  If not, it will create one.
> >     >
> >     >     So yes : it is exactly:
> >     >       possible that the server in this case opens the connection
> itself
> >     >       without waiting for the client to reconnect?
> >     >
> >     >     NeilBrown
> >     >
> >     >
> >     >     On Tue, Feb 18 2020, Aurelien Degremont wrote:
> >     >
> >     >     > Thanks for your reply.
> >     >     > I think I have a good enough understanding of LNET itself.
> My question was more about how LNET is being used by Lustre itself.
> >     >     >
> >     >     > LNET will established connections only if asked for by upper
> layers.
> >     >     > When I was talking about client and server, I was talking
> about how Lustre was using it.
> >     >     >
> >     >     > As far as I understood, Lustre server only contact clients
> when they need to send LDLM callbacks.
> >     >     > They do so through the socket already opened by the client
> (reverse import).
> >     >     > What happened if the socket is closed is what I'm not sure.
> I though the server is rather waiting for the client to reconnect and if
> not, is more or less evicting it.
> >     >     > Could it be possible that the server in this case opens the
> connection itself without waiting for the client to reconnect?
> >     >     >
> >     >     >
> >     >     > Aurélien
> >     >     >
> >     >     > Le 18/02/2020 05:42, « NeilBrown » <neilb at suse.com> a
> écrit :
> >     >     >
> >     >     >
> >     >     >     LNet is a peer-to-peer protocol, it has no concept of
> client and server.
> >     >     >     If one host needs to send a message to another but
> doesn't already have
> >     >     >     a connection, it creates a new connection.
> >     >     >     I don't yet know enough specifics of the lustre protocol
> to be certain
> >     >     >     of the circumstances when a lustre server will need to
> initiate a message
> >     >     >     to a client, but I imagine that recalling a lock might
> be one.
> >     >     >
> >     >     >     I think you should assume that any LNet node might
> receive a connection
> >     >     >     from any other LNet node (for which they share an LNet
> network), and
> >     >     >     that the connection could come from any port between 512
> and 1023
> >     >     >     (LNET_ACCEPTOR_MIN_PORT to LNET_ACCEPTOR_MAX_PORT).
> >     >     >
> >     >     >     NeilBrown
> >     >     >
> >     >     >
> >     >     >
> >     >     >     On Mon, Feb 17 2020, Degremont, Aurelien wrote:
> >     >     >
> >     >     >     > Hi all,
> >     >     >     >
> >     >     >     > From what I've understood so far, LNET listens on port
> 988 by default and peers connect to it using 1021-1023 TCP ports as source
> ports.
> >     >     >     > At Lustre level, servers listen on 988 and clients
> connect to them using the same source ports 1021-1023.
> >     >     >     > So only accepting connections to port 988 on server
> side sounded pretty safe to me. However, I've seen connections from
> 1021-1023 to 988, from server hosts to client hosts sometimes.
> >     >     >     > I can't understand what mechanism could trigger these
> connections. Did I miss something?
> >     >     >     >
> >     >     >     > Thanks
> >     >     >     >
> >     >     >     > Aurélien
> >     >     >     >
> >     >     >     > _______________________________________________
> >     >     >     > lustre-discuss mailing list
> >     >     >     > lustre-discuss at lists.lustre.org
> >     >     >     >
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> >     >     >
> >     >
> >
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-- 

*Steve Crusan*

Storage Specialist







DownUnder GeoSolutions



16200 Park Row Drive, Suite 100

Houston TX 77084, USA

tel +1 832 582 3221

stevec at dug.com

www.dug.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200220/86a2b287/attachment-0001.html>


More information about the lustre-discuss mailing list