[lustre-discuss] OST's wating fro client on a pcs cluster

Meijering, Koos h.meijering at rug.nl
Fri Nov 19 03:01:22 PST 2021


Hi Colin,

I've added here 3 log file 1 from the metadata and 2 from the object stores.
Before this logs started the filesystem was working, then I requested the
cluster to failover muse-OST0001 from oss01 to oss02.


On Thu, 18 Nov 2021 at 17:11, Colin Faber <cfaber at gmail.com> wrote:

> Hi Koos,
>
> First thing -- it's generally a bad idea to run newer server versions with
> older clients (the opposite isn't true).
>
> Second -- do you have any logging that you can share from the client
> itself? (dmesg, syslog, etc)
>
> A quick test may be to run 2.12.7 clients against your cluster to verify
> that there is no interop problem.
>
> -cf
>
>
> On Thu, Nov 18, 2021 at 8:58 AM Meijering, Koos via lustre-discuss <
> lustre-discuss at lists.lustre.org> wrote:
>
>> Hi all,
>>
>> We have build a lustre cluster server environment on CentOS7 and lustre
>> 2.12.7
>> The clients are using 2.12.5
>> The setup is 3 clusters for a 3PB filesystem
>> One cluster is a two node cluster built for MGS and MDT's
>> The other two clusters are also two node cluster used for the OST's
>> The cluster framework is working as expected.
>>
>> The servers are connected in a multirail network, because some clients
>> are in IB and the other clients are on ethernet
>>
>> But we have the following problem. When an OST failover to the
>> second node the clients are unable to contact the OST that is started on
>> the oder node.
>> The OST recovery status is waiting for clients
>> When we fail it back it starts working again and the recovery status is
>> compple
>>
>> We tried to abort the recovery but that does not work.
>>
>> We used these documents to build the cluster:
>> https://wiki.lustre.org/Creating_the_Lustre_Management_Service_(MGS)
>> https://wiki.lustre.org/Creating_the_Lustre_Metadata_Service_(MDS)
>> https://wiki.lustre.org/Creating_Lustre_Object_Storage_Services_(OSS)
>>
>> https://wiki.lustre.org/Creating_Pacemaker_Resources_for_Lustre_Storage_Services
>>
>> I'm not sure what the next steps must be to find the problem and where to
>> look.
>>
>> Best regards
>> Koos Meijering
>> ........................................................................
>> HPC Team
>> Rijksuniversiteit Groningen
>> ........................................................................
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20211119/7afe1ba8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: oss01.log
Type: text/x-log
Size: 915 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20211119/7afe1ba8/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: oss02.log
Type: text/x-log
Size: 2266 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20211119/7afe1ba8/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mds01.log
Type: text/x-log
Size: 572 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20211119/7afe1ba8/attachment-0002.bin>


More information about the lustre-discuss mailing list