[lustre-discuss] OST's wating fro client on a pcs cluster

Meijering, Koos h.meijering at rug.nl
Mon Nov 22 08:25:05 PST 2021


Hi Colin,

I have a small drawing which represents the setup, it's attached.


On Fri, 19 Nov 2021 at 22:49, Colin Faber <cfaber at gmail.com> wrote:

> Hi Koos,
>
> One thing you mentioned that I should have picked up on sooner, was "The
> servers are connected in a multirail network, because some clients are in
> IB and the other clients are on ethernet"
>
> Can you describe your topology? How are the various elements connected to
> each other?
>
> -cf
>
>
> On Fri, Nov 19, 2021 at 5:38 AM Meijering, Koos <h.meijering at rug.nl>
> wrote:
>
>> One more addition, I also the following message on the oss who had the
>> ost before the failover:
>> Nov 19 12:43:59 dh4-oss01 kernel: LustreError: 137-5: muse-OST0001_UUID:
>> not available for connect from 172.23.53.214 at o2ib4 (no target). If you
>> are running an HA pair check that the target is mounted on the other server.
>>
>> On Fri, 19 Nov 2021 at 12:01, Meijering, Koos <h.meijering at rug.nl> wrote:
>>
>>> Hi Colin,
>>>
>>> I've added here 3 log file 1 from the metadata and 2 from the object
>>> stores.
>>> Before this logs started the filesystem was working, then I requested
>>> the cluster to failover muse-OST0001 from oss01 to oss02.
>>>
>>>
>>> On Thu, 18 Nov 2021 at 17:11, Colin Faber <cfaber at gmail.com> wrote:
>>>
>>>> Hi Koos,
>>>>
>>>> First thing -- it's generally a bad idea to run newer server versions
>>>> with older clients (the opposite isn't true).
>>>>
>>>> Second -- do you have any logging that you can share from the client
>>>> itself? (dmesg, syslog, etc)
>>>>
>>>> A quick test may be to run 2.12.7 clients against your cluster to
>>>> verify that there is no interop problem.
>>>>
>>>> -cf
>>>>
>>>>
>>>> On Thu, Nov 18, 2021 at 8:58 AM Meijering, Koos via lustre-discuss <
>>>> lustre-discuss at lists.lustre.org> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> We have build a lustre cluster server environment on CentOS7 and
>>>>> lustre 2.12.7
>>>>> The clients are using 2.12.5
>>>>> The setup is 3 clusters for a 3PB filesystem
>>>>> One cluster is a two node cluster built for MGS and MDT's
>>>>> The other two clusters are also two node cluster used for the OST's
>>>>> The cluster framework is working as expected.
>>>>>
>>>>> The servers are connected in a multirail network, because some clients
>>>>> are in IB and the other clients are on ethernet
>>>>>
>>>>> But we have the following problem. When an OST failover to the
>>>>> second node the clients are unable to contact the OST that is started on
>>>>> the oder node.
>>>>> The OST recovery status is waiting for clients
>>>>> When we fail it back it starts working again and the recovery status
>>>>> is compple
>>>>>
>>>>> We tried to abort the recovery but that does not work.
>>>>>
>>>>> We used these documents to build the cluster:
>>>>> https://wiki.lustre.org/Creating_the_Lustre_Management_Service_(MGS)
>>>>> https://wiki.lustre.org/Creating_the_Lustre_Metadata_Service_(MDS)
>>>>> https://wiki.lustre.org/Creating_Lustre_Object_Storage_Services_(OSS)
>>>>>
>>>>> https://wiki.lustre.org/Creating_Pacemaker_Resources_for_Lustre_Storage_Services
>>>>>
>>>>> I'm not sure what the next steps must be to find the problem and where
>>>>> to look.
>>>>>
>>>>> Best regards
>>>>> Koos Meijering
>>>>>
>>>>> ........................................................................
>>>>> HPC Team
>>>>> Rijksuniversiteit Groningen
>>>>>
>>>>> ........................................................................
>>>>> _______________________________________________
>>>>> lustre-discuss mailing list
>>>>> lustre-discuss at lists.lustre.org
>>>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20211122/e749b5c6/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DH4_Lustre_net.png
Type: image/png
Size: 52007 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20211122/e749b5c6/attachment-0001.png>


More information about the lustre-discuss mailing list