[lustre-discuss] How to activate an OST on a client ?

Hans Henrik Happe happe at nbi.dk
Thu Aug 29 07:15:20 PDT 2024


Hi,

We just had a similar issue on 2.15.5. Infiniband clients not 
reconnecting after a target outage.

Deleting the LNet net and importing the config again solved it without 
reboot and unmount:

# letctl net del --net 02ib
# lnetctl import < /etc/lnet.conf

Cheers,
Hans Henrik

On 28/08/2024 18.18, Lixin Liu via lustre-discuss wrote:
>
> We had the same problem after we upgraded Lustre servers from 2.12.8 
> to 2.15.3.
>
> Clients were running 2.15.3 on CentOS 7. Random OST dropped out 
> frequently on
>
> busy login nodes (almost daily), but less so on compute nodes. “lctl” 
> command
>
> cannot active OSTs and reboot we the only way to clear the problem.
>
> In June, we upgraded all client OS to AlmaLinux 9.3 and Lustre version 
> to 2.15.4 on
>
> both servers and clients (missed 2.15.5 release by about 2 weeks). 
> After the upgrade,
>
> we no longer have this problem.
>
> In our case, I wonder this was OmniPath related. Servers on AlamLinux 
> 8 was using
>
> in kernel driver, but CentOS 7 clients are using driver from 
> Intel/Cornelis release.
>
> Alma 9 clients are now also using in kernel driver.
>
> Cheers,
>
> Lixin.
>
> *From: *lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on 
> behalf of Cameron Harr via lustre-discuss 
> <lustre-discuss at lists.lustre.org>
> *Reply-To: *Cameron Harr <harr1 at llnl.gov>
> *Date: *Wednesday, August 28, 2024 at 8:19 AM
> *To: *"lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
> *Subject: *Re: [lustre-discuss] How to activate an OST on a client ?
>
> There's also an "lctl --device <dev> activate" that I've used in the 
> past though I don't know what conditions need to be for it to work.
>
> On 8/27/24 07:46, Andreas Dilger via lustre-discuss wrote:
>
>     Hi Jan,
>
>     There is "lctl --device XXXX recover" that will trigger a
>     reconnect to the named OST device (per "lctl dl" output), but not
>     sure if that will help.
>
>     Cheers, Andreas
>
>
>
>         On Aug 22, 2024, at 06:36, Haarst, Jan van via lustre-discuss
>         <lustre-discuss at lists.lustre.org>
>         <mailto:lustre-discuss at lists.lustre.org> wrote:
>
>         Hi,
>
>         Probably the wording of the subject doesn’t actually cover the
>         issue, what we see is this :
>
>         We have a client behind a router (linking tcp to Omnipath)
>         that shows an inactive OST (all on 2.15.5).
>
>         Other clients that go through the router do not have this issue.
>
>         One client had the same issue, although it showed a different
>         OST as inactive.
>
>         After a reboot, all was well again on that machine.
>
>         The clients can lctl ping the OSSs.
>
>         So although we have a workaround (reboot the client), it would
>         be nice to:
>
>          1. Fix the issue without a reboot
>          2. Fix the underlying issue.
>
>         It might be unrelated, but we also see another routing issue
>         every now and then:
>
>         The router stops routing request toward a certain OSS, and
>         this can be fixed by deleting the peer_nid of the OSS from the
>         router.
>
>         I am probably missing informative logs, but I’m more than
>         happy to try to generate them, if somebody has a pointer to how.
>
>         We are a bit stumped right now.
>
>         With kind regards,
>
>         -- 
>
>         Jan van Haarst
>
>         HPC Administrator
>
>         For Anunna/HPC questions, please use https://support.wur.nl
>         <https://urldefense.us/v3/__https:/support.wur.nl__;!!G2kpM7uM-TzIFchu!1YPSOGUFPvipdg8HUxDkmcB7rvfUxuSATnKZq-9LFTP16TrMxtlrPe7m3ccX4BmKFoLsVnaKiIL3u4pxK2GT6mMjyuAoAg$> (with
>         HPC as service)
>
>         Aanwezig: maandag, dinsdag, donderdag & vrijdag
>
>         Facilitair Bedrijf, onderdeel van Wageningen University &
>         Research
>
>         Afdeling Informatie Technologie
>
>         Postbus 59, 6700 AB, Wageningen
>
>         Gebouw 116, Akkermaalsbos 12, 6700 WB, Wageningen
>
>         http://www.wur.nl/nl/Disclaimer.htm
>         <https://urldefense.us/v3/__http:/www.wur.nl/nl/Disclaimer.htm__;!!G2kpM7uM-TzIFchu!1YPSOGUFPvipdg8HUxDkmcB7rvfUxuSATnKZq-9LFTP16TrMxtlrPe7m3ccX4BmKFoLsVnaKiIL3u4pxK2GT6mP2LXgG1Q$>
>
>         _______________________________________________
>         lustre-discuss mailing list
>         lustre-discuss at lists.lustre.org
>         http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>         <https://urldefense.us/v3/__http:/lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!G2kpM7uM-TzIFchu!1YPSOGUFPvipdg8HUxDkmcB7rvfUxuSATnKZq-9LFTP16TrMxtlrPe7m3ccX4BmKFoLsVnaKiIL3u4pxK2GT6mNJQIy33g$>
>
>
>
>     _______________________________________________
>
>     lustre-discuss mailing list
>
>     lustre-discuss at lists.lustre.org
>
>     https://urldefense.us/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!G2kpM7uM-TzIFchu!1YPSOGUFPvipdg8HUxDkmcB7rvfUxuSATnKZq-9LFTP16TrMxtlrPe7m3ccX4BmKFoLsVnaKiIL3u4pxK2GT6mNJQIy33g$  <https://urldefense.us/v3/__http:/lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!G2kpM7uM-TzIFchu!1YPSOGUFPvipdg8HUxDkmcB7rvfUxuSATnKZq-9LFTP16TrMxtlrPe7m3ccX4BmKFoLsVnaKiIL3u4pxK2GT6mNJQIy33g$>  
>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240829/4462fba0/attachment.htm>


More information about the lustre-discuss mailing list