[lustre-discuss] Configuring Luster failover on NON shared targets/disks

Sat May 21 00:04:39 PDT 2016

Hi Andreas

I did take some time, to get back to this. I started to try out this
configuration on bunch of VM's with the powerful underlying HW.

*Configuration:*
1 Physical machine hosts 2 VM ( Vm1 and Vm2 ) . Both of them have
kernel 3.10.0-327.13.1.el7_lustre.x86_64 , Zfs , Iscsi
Vm1 - disk 1 , disk 2
Vm2 - disk 3,  disk 4

After Iscsi setup
Vm1 - disk1 , disk 3
Vm2 - disk 4, disk 2

After zpool
Vm1 - disk1 || disk 3 ( zpool mirror )- for mgs
Vm2-  disk4 || disk 2 ( zpool mirror ) - for mdt

[root at lustre_mgs01_vm03 ~]# zpool status
  pool: mds1_2
  state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        mds1_2      ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            vdb2    ONLINE       0     0     0

when setting up mgs/mdt i get the following error

[root at lustre_mgs01_vm03 /]# mkfs.lustre --mgs --backfstype=zfs mds1_2/mgs
mkfs.lustre FATAL: unhandled/unloaded fs type 5 'zfs'
mkfs.lustre FATAL: unable to prepare backend (22)
mkfs.lustre: exiting with 22 (Invalid argument)

when i searched for the specific error i ran into this Jira..
https://jira.hpdd.intel.com/browse/LU-7601
i have lustre version
[root at lustre_mgs01_vm03 /]# cat /proc/fs/lustre/version
lustre: 2.8.53_11_gfd4ab6e
kernel: patchless_client
build:  2.8.53_11_gfd4ab6e

I found an earlier discussion on similar topic. I plan to setup something
similar but with Iscsi instead of common storage boxes. I dont see the
output similar to this thread for mkfs.lustre command.
https://lists.01.org/pipermail/hpdd-discuss/2013-December/000662.html

I understand that this might not be a regular setup, but i would like to
set it up and see the performance if possible.
Please let me if i am missing something.

Thanks
Divakar

On Sat, Feb 6, 2016 at 1:57 AM, Dilger, Andreas <andreas.dilger at intel.com>
wrote:

> On 2016/02/05, 17:08, "lustre-discuss on behalf of sohamm" <
> lustre-discuss-bounces at lists.lustre.org on behalf of sohamm at gmail.com>
> wrote:
>
> Hi
>
> I have been reading bunch of documents on how failures are handled in
> Luster and almost all of them seem to indicate that i would need a shared
> disks/ target for MDS or OSS failover configuration. I want to know if
> failover configuration is possible without the shared disks. Eg i have one
> physical box i want to configure as OSS/OST and another as MGS/MDS/MDT.
> Each physical box will have its own HDD/SDDs and are connected via
> ethernet. Please guide and point me to any good documentation available for
> such configuration.
>
>
> It is _possible_ to do this without shared disks, if there is some other
> mechanism to make the data available on both nodes.  One option is to use
> iSCSI targets (SRP or iSER) and mirror the drives across the two servers
> using ZFS, making sure you serve each mirrored device from only one node.
> Then, if the primary server fails you can mount the filesystem on the
> backup node. This is described in
> http://wiki.lustre.org/MDT_Mirroring_with_ZFS_and_SRP and
> http://cdn.opensfs.org/wp-content/uploads/2011/11/LUG-2012.pptx .
>
> Note that if you only have a 2-way mirror you've lost 1/2 of your disks
> during failover.  That might be OK for the MDT if it has been configured
> correctly, since there are additional copies of metadata.  For the OST you
> could use RAID-1+5 or RAID-1+6 (e.g. mirror of RAID-5/6 devices on each
> node).  With a more complex configuration it would even potentially be
> possible to export iSCSI disks from a group of nodes and use RAID-6 of
> disks from different nodes so that redundancy isn't lost when a single node
> goes down.  That might get hairy during configuration for a large system.
>
> Another alternative to iSCSI+ZFS would be some other form of network block
> device (e.g. NBD or DRBD) and then build your target on top of that.  It is
> essentially the same but the consistency is managed by the block device
> instead of the filesystem. IMHO (just a gut feeling, never tested) having a
> "robust" network block device would be slower than having ZFS do this
> because the block device doesn't know the details of what the filesystem is
> doing, and will add its own overhead to provide its own consistency in
> addition to the consistency provided by ZFS itself.
>
> That said, this isn't a typical Lustre configuration, but I think there
> would definitely be other interested parties if you tried this out and
> reported your results back here.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Intel High Performance Data Division
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20160521/53d64129/attachment.htm>