[lustre-discuss] lustre-discuss Digest, Vol 167, Issue 14

Kevin M. Hildebrand kevin at umd.edu
Fri Feb 14 05:14:33 PST 2020


Yep, looks like that's indeed the issue.  Reducing peer_credits to 42 makes
the problem go away.

Thanks,
Kevin

On Thu, Feb 13, 2020 at 4:25 PM <lustre-discuss-request at lists.lustre.org>
wrote:

> Send lustre-discuss mailing list submissions to
>         lustre-discuss at lists.lustre.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> or, via email, send a message with subject or body 'help' to
>         lustre-discuss-request at lists.lustre.org
>
> You can reach the person managing the list at
>         lustre-discuss-owner at lists.lustre.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lustre-discuss digest..."
>
>
> Today's Topics:
>
>    1. Re: Lustre 2.12.3 client can't mount filesystem (Weiss, Karsten)
>    2. Re: Lustre 2.12.3 client can't mount filesystem
>       (Kevin M. Hildebrand)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 13 Feb 2020 08:11:08 +0000
> From: "Weiss, Karsten" <karsten.weiss at atos.net>
> To: "lustre-discuss at lists.lustre.org"
>         <lustre-discuss at lists.lustre.org>
> Subject: Re: [lustre-discuss] Lustre 2.12.3 client can't mount
>         filesystem
> Message-ID: <cd1d4d54bbb4499998867447d1b8b56b at atos.net>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi,
>
> this is probably https://jira.whamcloud.com/browse/LU-12901 which is
> still open and was just postponed to Lustre 2.14.0.
>
> Reducing peer_credits to 42 is a workaround.
>
> Best regards,
> Karsten
>
> From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> On Behalf
> Of Andreas Dilger
> Sent: Wednesday, February 12, 2020 21:50
> To: Kevin M. Hildebrand <kevin at umd.edu>
> Cc: lustre-discuss at lists.lustre.org
> Subject: Re: [lustre-discuss] Lustre 2.12.3 client can't mount filesystem
>
> Can you please try 2.12.4, it was just released yesterday and has a number
> of fixes.
>
>
> On Feb 12, 2020, at 13:36, Kevin M. Hildebrand <kevin at umd.edu<mailto:
> kevin at umd.edu>> wrote:
>
> I just updated some of my clients to RHEL 7.7, Lustre 2.12.3, MOFED 4.7.
> Server version is 2.10.8.
>
> I'm now getting errors mounting the filesystem on the client.  In fact, I
> can't even do an 'lctl ping' to any of the servers without getting an I/O
> error.
>
> Debug logs show this message when I attempt an lctl ping:
> 00000800:00020000:0.0:1581538955.090767:0:20471:0:(o2iblnd.c:941:kiblnd_create_conn())
> Can't create QP: -12, send_wr: 32634, recv_wr: 254, send_sge: 2, recv_sge: 1
>
> # lctl list_nids
> 10.11.80.65 at o2ib3<mailto:10.11.80.65 at o2ib3>
> # lctl ping 10.11.80.50 at o2ib3<mailto:10.11.80.50 at o2ib3>
> failed to ping 10.11.80.50 at o2ib3<mailto:10.11.80.50 at o2ib3>: Input/output
> error
>
> Interestingly, if I do an 'lctl ping' to the client _from_ the server, the
> ping succeeds, and from that point on pings from client _to_ server work
> fine until the client is rebooted or lnet is reloaded.
>
> ko2iblnd parameters match on clients and servers, namely:
> options ko2iblnd peer_credits=128 peer_credits_hiw=64 credits=1024
> concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048
> fmr_flush_trigger=512 fmr_cache=1
>
> Anyone have any thoughts?
>
> Thanks,
> Kevin
>
> --
> Kevin Hildebrand
> University of Maryland
> Division of IT
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Lustre Architect
> Whamcloud
>
>
>
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200213/4ba9d033/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Thu, 13 Feb 2020 08:24:30 -0500
> From: "Kevin M. Hildebrand" <kevin at umd.edu>
> To: Andreas Dilger <adilger at whamcloud.com>
> Cc: "lustre-discuss at lists.lustre.org"
>         <lustre-discuss at lists.lustre.org>
> Subject: Re: [lustre-discuss] Lustre 2.12.3 client can't mount
>         filesystem
> Message-ID:
>         <
> CAJmU7QmAmoYmb5ZaVYMeFPNi2p2qxOkTczM2bzKDbRzB9TNwtg at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Ok, I just tried 2.12.4, and the problem still persists.  The only
> difference I see now is that the error messages are appearing in syslog
> instead of needing to pull them from the debug log.
> [  230.413761] LNetError: 1423:0:(o2iblnd.c:941:kiblnd_create_conn()) Can't
> create QP: -12, send_wr: 32634, recv_wr: 254, send_sge: 2, recv_sge: 1
>
> Thanks,
> Kevin
>
> On Wed, Feb 12, 2020 at 3:50 PM Andreas Dilger <adilger at whamcloud.com>
> wrote:
>
> > Can you please try 2.12.4, it was just released yesterday and has a
> number
> > of fixes.
> >
> > On Feb 12, 2020, at 13:36, Kevin M. Hildebrand <kevin at umd.edu> wrote:
> >
> > I just updated some of my clients to RHEL 7.7, Lustre 2.12.3, MOFED 4.7.
> > Server version is 2.10.8.
> >
> > I'm now getting errors mounting the filesystem on the client.  In fact, I
> > can't even do an 'lctl ping' to any of the servers without getting an I/O
> > error.
> >
> > Debug logs show this message when I attempt an lctl ping:
> >
> 00000800:00020000:0.0:1581538955.090767:0:20471:0:(o2iblnd.c:941:kiblnd_create_conn())
> > Can't create QP: -12, send_wr: 32634, recv_wr: 254, send_sge: 2,
> recv_sge: 1
> >
> > # lctl list_nids
> > 10.11.80.65 at o2ib3
> > # lctl ping 10.11.80.50 at o2ib3
> > failed to ping 10.11.80.50 at o2ib3: Input/output error
> >
> > Interestingly, if I do an 'lctl ping' to the client _from_ the server,
> the
> > ping succeeds, and from that point on pings from client _to_ server work
> > fine until the client is rebooted or lnet is reloaded.
> >
> > ko2iblnd parameters match on clients and servers, namely:
> > options ko2iblnd peer_credits=128 peer_credits_hiw=64 credits=1024
> > concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048
> > fmr_flush_trigger=512 fmr_cache=1
> >
> > Anyone have any thoughts?
> >
> > Thanks,
> > Kevin
> >
> > --
> > Kevin Hildebrand
> > University of Maryland
> > Division of IT
> > _______________________________________________
> > lustre-discuss mailing list
> > lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> >
> >
> > Cheers, Andreas
> > --
> > Andreas Dilger
> > Principal Lustre Architect
> > Whamcloud
> >
> >
> >
> >
> >
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200213/452b1c88/attachment-0001.html
> >
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
> ------------------------------
>
> End of lustre-discuss Digest, Vol 167, Issue 14
> ***********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200214/7f61c26b/attachment.html>


More information about the lustre-discuss mailing list