[lustre-discuss] MDS using D3710 DAS - partially Solved

Sid Young sid.young at gmail.com
Thu Feb 18 17:17:04 PST 2021


After some investigation it looks like a timeout issue in the smartpqi
kernel module is causing the disks to be removed soon after they are
initially added based on what is reported in "dmesg"

This issue first occurred in RHEL/Centos 7.4 and should have been resolved
by centos 7.7. I've emailed the maintainer of the module and he's come
back to me with an offer to create a test driver to see if increasing the
timeout fixes the issue. There is an existing patch but its version is less
than the one in Centos 7.9.

On the bright side, I've built and rebuilt the Lustre MDS and OSS config
several times as I optimise the installation while running under
Pacemaker and have been able to mount /lustre and /home on the Compute
nodes so this new system is 50% of the way there :)


Sid Young


> Today's Topics:
>
>    1. Re: MDS using D3710 DAS (Sid Young)
>    2. Re: MDS using D3710 DAS (Christopher Mountford)
>
>
>
> ---------- Forwarded message ----------
> From: Sid Young <sid.young at gmail.com>
> To: Christopher Mountford <cjm14 at leicester.ac.uk>,
> lustre-discuss at lists.lustre.org
> Cc:
> Bcc:
> Date: Mon, 15 Feb 2021 08:42:43 +1000
> Subject: Re: [lustre-discuss] MDS using D3710 DAS
> Hi Christopher,
>
> Just some background, all servers are DL385's all servers are running the
> same image of Centos 7.9, The MDS HA pair have a SAS connected D3710 and
> the dual OSS HA pair have a D8000 each with 45 disks in each of them.
>
> The D3710 (which has 24x 960G SSD's) seams a bit hit and miss at
> presenting two LV's, I had setup a /lustre and /home which I was going to
> use ldiskfs rather than zfs however I am finding that the disks MAY present
> to both servers after some reboots but usually the first server to reboot
> see's the LV presented and the other only see's its local internal disks
> only, so the array appears to only present the LV's to one host most of the
> time.
>
> With the 4 OSS servers. i see the same issue, sometimes the LV's present
> and sometimes they don't.
>
> I was planning on setting up the OST's as ldiskfs as well, but I could
> also go zfs, my test bed system and my current HPC uses ldsikfs.
>
> Correct me if I am wrong, but disks should present to both servers all the
> time and using PCS I should be able to mount up a /lustre and /home one the
> first server while the disks present on the second server but no software
> is mounting them so there should be no issues?
>
>
> Sid Young
>
> On Fri, Feb 12, 2021 at 7:27 PM Christopher Mountford <
> cjm14 at leicester.ac.uk> wrote:
>
>> Hi Sid,
>>
>> We've a similar hardware configuration - 2 MDS pairs and 1 OSS pair which
>> each consist of 2 DL360 connected to a single D3700. However we are using
>> Lustre on ZFS with each array split into 2 or 4 zpools (depending on the
>> usage) and haven't seen any problems of this sort. Are you using ldiskfs?
>>
>> - Chris
>>
>>
>> On Fri, Feb 12, 2021 at 03:14:58PM +1000, Sid Young wrote:
>> >    G'day all,
>> >    Is anyone using a HPe D3710 with two HPeDL380/385 servers in a MDS HA
>> >    Configuration? If so, is your D3710 presenting LV's to both servers
>> at
>> >    the same time AND are you using PCS with the Lustre PCS Resources?
>> >    I've just received new kit and cannot get disk to present to the MDS
>> >    servers at the same time..... :(
>> >    Sid Young
>>
>> > _______________________________________________
>> > lustre-discuss mailing list
>> > lustre-discuss at lists.lustre.org
>> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>>
>
>
> ---------- Forwarded message ----------
> From: Christopher Mountford <cjm14 at leicester.ac.uk>
> To: Sid Young <sid.young at gmail.com>
> Cc: Christopher Mountford <cjm14 at leicester.ac.uk>,
> lustre-discuss at lists.lustre.org
> Bcc:
> Date: Mon, 15 Feb 2021 10:44:10 +0000
> Subject: Re: [lustre-discuss] MDS using D3710 DAS
>
> Hi Sid.
>
> We use the D3700s (and our D8000s) as JBODS with zfs providing the
> redundancy - do you have some kind of hardware RAID? If so, are your raid
> controller the array corntrollers or on the HBAs? Off the top of my head,
> if the latter, there might be an issue with multiple HBAs trying to
> assemble the same RAID array?
>
> - Chris.
>
> On Mon, Feb 15, 2021 at 08:42:43AM +1000, Sid Young wrote:
> >    Hi Christopher,
> >    Just some background, all servers are DL385's all servers are running
> >    the same image of Centos 7.9, The MDS HA pair have a SAS connected
> >    D3710 and the dual OSS HA pair have a D8000 each with 45 disks in each
> >    of them.
> >    The D3710 (which has 24x 960G SSD's) seams a bit hit and miss at
> >    presenting two LV's, I had setup a /lustre and /home which I was going
> >    to use ldiskfs rather than zfs however I am finding that the disks MAY
> >    present to both servers after some reboots but usually the first
> server
> >    to reboot see's the LV presented and the other only see's its local
> >    internal disks only, so the array appears to only present the LV's to
> >    one host most of the time.
> >    With the 4 OSS servers. i see the same issue, sometimes the LV's
> >    present and sometimes they don't.
> >    I was planning on setting up the OST's as ldiskfs as well, but I could
> >    also go zfs, my test bed system and my current HPC uses ldsikfs.
> >    Correct me if I am wrong, but disks should present to both servers all
> >    the time and using PCS I should be able to mount up a /lustre and
> /home
> >    one the first server while the disks present on the second server but
> >    no software is mounting them so there should be no issues?
> >    Sid Young
> >
> >    On Fri, Feb 12, 2021 at 7:27 PM Christopher Mountford
> >    <[1]cjm14 at leicester.ac.uk> wrote:
> >
> >      Hi Sid,
> >      We've a similar hardware configuration - 2 MDS pairs and 1 OSS pair
> >      which each consist of 2 DL360 connected to a single D3700. However
> >      we are using Lustre on ZFS with each array split into 2 or 4 zpools
> >      (depending on the usage) and haven't seen any problems of this sort.
> >      Are you using ldiskfs?
> >      - Chris
> >      On Fri, Feb 12, 2021 at 03:14:58PM +1000, Sid Young wrote:
> >      >    G'day all,
> >      >    Is anyone using a HPe D3710 with two HPeDL380/385 servers in a
> >      MDS HA
> >      >    Configuration? If so, is your D3710 presenting LV's to both
> >      servers at
> >      >    the same time AND are you using PCS with the Lustre PCS
> >      Resources?
> >      >    I've just received new kit and cannot get disk to present to
> >      the MDS
> >      >    servers at the same time..... :(
> >      >    Sid Young
> >      > _______________________________________________
> >      > lustre-discuss mailing list
> >      > [2]lustre-discuss at lists.lustre.org
> >      > [3]http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> >
> > References
> >
> >    1. mailto:cjm14 at leicester.ac.uk
> >    2. mailto:lustre-discuss at lists.lustre.org
> >    3.
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&data=04%7C01%7Ccjm14%40leicester.ac.uk%7C4d86239b31b545d327db08d8d139f050%7Caebecd6a31d44b0195ce8274afe853d9%7C0%7C0%7C637489394067185599%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=x1PMOvlWp3bocS%2Bub1mpvE1Mn59Q0EU0M18NQbj1wOk%3D&reserved=0
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20210219/5d8a3e82/attachment-0001.html>


More information about the lustre-discuss mailing list