[lustre-discuss] ZFS and multipathing for OSTs

Riccardo Veraldi riccardo.veraldi at gmail.com
Fri Apr 26 09:50:13 PDT 2019


for my experience multipathd+ZFS works well, and it worked well usually.
I just remove the broken disk when it happens, replace it and the new
multipathd device is added once the disk is replaced, and then then I
start resilvering.
Anyway I found out this not always works with some version of JBOD disk
array/firmware.
Some Proware controller that I had did not recognize that a disk was
replaced. But This is not a multipathd problem in my case.
So my hint is to try it out with your hardware and see how it behaves.

On 26/04/2019 16:57, Kurt Strosahl wrote:
>
> Hey, thanks!
>
>
> I tried the multipathing part you had down there and I couldn't get it
> to work... I did find that this worked though
>
>
> #I pick a victim device
> multipath -ll
> ...
> mpathax (35000cca2680a8194) dm-49 HGST    ,HUH721010AL5200 
> size=9.1T features='0' hwhandler='0' wp=rw
> `-+- policy='service-time 0' prio=1 status=enabled
>   |- 1:0:10:0   sdj     8:144   active ready running
>   `- 11:0:9:0   sddy    128:0   active ready running
> #then I remove the device
> multipath -f mpathax
> #and verify that it is gone
> multipath -ll | grep mpathax
> #then I run the following, which seems to rescan for devices.
> multipath -v2
> Apr 26 10:49:06 | sdj: No SAS end device for 'end_device-1:1'
> Apr 26 10:49:06 | sddy: No SAS end device for 'end_device-11:1'
> create: mpathax (35000cca2680a8194) undef HGST    ,HUH721010AL5200 
> size=9.1T features='0' hwhandler='0' wp=undef
> `-+- policy='service-time 0' prio=1 status=undef
>   |- 1:0:10:0   sdj     8:144   undef ready running
>   `- 11:0:9:0   sddy    128:0   undef ready running
> #then its back
> multipath -ll mpathax
> mpathax (35000cca2680a8194) dm-49 HGST    ,HUH721010AL5200 
> size=9.1T features='0' hwhandler='0' wp=rw
> `-+- policy='service-time 0' prio=1 status=enabled
>   |- 1:0:10:0   sdj     8:144   active ready running
>   `- 11:0:9:0   sddy    128:0   active ready running
>
> I still need to test it fully once I get the whole stack up and
> running, but this seems to be a step in the right direction.
>
>
> w/r,
> Kurt
>
> ------------------------------------------------------------------------
> *From:* Jongwoo Han <jongwoohan at gmail.com>
> *Sent:* Friday, April 26, 2019 6:28 AM
> *To:* Kurt Strosahl
> *Cc:* lustre-discuss at lists.lustre.org
> *Subject:* Re: [lustre-discuss] ZFS and multipathing for OSTs
>  
> Disk replacement with multipathd + zfs is somewhat not convenient.
>
> step1: mark offline the disk you should replace with zpool command
> step2: remove disk from multipathd table with multipath -f <mpath id>
> step3: replace disk
> step4: add disk to multipath table with multipath -ll <mpath id>
> step5:  replace disk in zpool with zpool replace
>
> try this in your test environment and tell us if you have found
> anything interesting in the syslog.
> In my case replacing single disk in multipathd+zfs pool triggerd
> massive udevd partition scan. 
>
> Thanks
> Jongwoo Han
>
> 2019년 4월 26일 (금) 오전 3:44, Kurt Strosahl <strosahl at jlab.org
> <mailto:strosahl at jlab.org>>님이 작성:
>
>     Good Afternoon,
>
>
>         As part of a new lustre deployment I've now got two disk
>     shelves connected redundantly to two servers.  Since each disk has
>     two paths to the server I'd like to use multipathing for both
>     redundancy and improved performance.  I haven't found examples or
>     discussion about such a setup, and was wondering if there are any
>     resources out there that I could consult.
>
>
>     Of particular interest would be examples of the
>     /etc/zfs/vdev_id.conf and any tuning that was done.  I'm also
>     wondering about extra steps that may have to be taken when doing a
>     disk replacement to account for the multipathing.  I've got plenty
>     of time to experiment with this process, but I'd rather not
>     reinvent the wheel if I don't have to.
>
>
>     w/r,
>
>     Kurt J. Strosahl
>     System Administrator: Lustre, HPC
>     Scientific Computing Group, Thomas Jefferson National Accelerator
>     Facility
>
>     _______________________________________________
>     lustre-discuss mailing list
>     lustre-discuss at lists.lustre.org
>     <mailto:lustre-discuss at lists.lustre.org>
>     http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>     <https://gcc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&data=02%7C01%7Cstrosahl%40jlab.org%7Cba16f1aff6144708f17708d6ca31e3ee%7Cb4d7ee1f4fb34f0690372b5b522042ab%7C1%7C1%7C636918712958511376&sdata=p6QC1JIfSnyq8IC1SgOJWlWdcD2Drs9vbtrutuynGEs%3D&reserved=0>
>
>
>
> -- 
> Jongwoo Han
> +82-505-227-6108
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190426/bc913c4d/attachment-0001.html>


More information about the lustre-discuss mailing list