[lustre-discuss] OST's wating fro client on a pcs cluster

Meijering, Koos h.meijering at rug.nl
Thu Nov 18 07:58:21 PST 2021


Hi all,

We have build a lustre cluster server environment on CentOS7 and lustre
2.12.7
The clients are using 2.12.5
The setup is 3 clusters for a 3PB filesystem
One cluster is a two node cluster built for MGS and MDT's
The other two clusters are also two node cluster used for the OST's
The cluster framework is working as expected.

The servers are connected in a multirail network, because some clients are
in IB and the other clients are on ethernet

But we have the following problem. When an OST failover to the second node
the clients are unable to contact the OST that is started on the oder node.
The OST recovery status is waiting for clients
When we fail it back it starts working again and the recovery status is
compple

We tried to abort the recovery but that does not work.

We used these documents to build the cluster:
https://wiki.lustre.org/Creating_the_Lustre_Management_Service_(MGS)
https://wiki.lustre.org/Creating_the_Lustre_Metadata_Service_(MDS)
https://wiki.lustre.org/Creating_Lustre_Object_Storage_Services_(OSS)
https://wiki.lustre.org/Creating_Pacemaker_Resources_for_Lustre_Storage_Services

I'm not sure what the next steps must be to find the problem and where to
look.

Best regards
Koos Meijering
........................................................................
HPC Team
Rijksuniversiteit Groningen
........................................................................
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20211118/a157ef76/attachment.html>


More information about the lustre-discuss mailing list