[lustre-discuss] Configuring Luster failover on NON shared targets/disks

Dilger, Andreas andreas.dilger at intel.com
Sat Feb 6 01:57:19 PST 2016

On 2016/02/05, 17:08, "lustre-discuss on behalf of sohamm" <lustre-discuss-bounces at lists.lustre.org<mailto:lustre-discuss-bounces at lists.lustre.org> on behalf of sohamm at gmail.com<mailto:sohamm at gmail.com>> wrote:


I have been reading bunch of documents on how failures are handled in Luster and almost all of them seem to indicate that i would need a shared disks/ target for MDS or OSS failover configuration. I want to know if failover configuration is possible without the shared disks. Eg i have one physical box i want to configure as OSS/OST and another as MGS/MDS/MDT. Each physical box will have its own HDD/SDDs and are connected via ethernet. Please guide and point me to any good documentation available for such configuration.

It is _possible_ to do this without shared disks, if there is some other mechanism to make the data available on both nodes.  One option is to use iSCSI targets (SRP or iSER) and mirror the drives across the two servers using ZFS, making sure you serve each mirrored device from only one node.  Then, if the primary server fails you can mount the filesystem on the backup node. This is described in http://wiki.lustre.org/MDT_Mirroring_with_ZFS_and_SRP and http://cdn.opensfs.org/wp-content/uploads/2011/11/LUG-2012.pptx .

Note that if you only have a 2-way mirror you've lost 1/2 of your disks during failover.  That might be OK for the MDT if it has been configured correctly, since there are additional copies of metadata.  For the OST you could use RAID-1+5 or RAID-1+6 (e.g. mirror of RAID-5/6 devices on each node).  With a more complex configuration it would even potentially be possible to export iSCSI disks from a group of nodes and use RAID-6 of disks from different nodes so that redundancy isn't lost when a single node goes down.  That might get hairy during configuration for a large system.

Another alternative to iSCSI+ZFS would be some other form of network block device (e.g. NBD or DRBD) and then build your target on top of that.  It is essentially the same but the consistency is managed by the block device instead of the filesystem. IMHO (just a gut feeling, never tested) having a "robust" network block device would be slower than having ZFS do this because the block device doesn't know the details of what the filesystem is doing, and will add its own overhead to provide its own consistency in addition to the consistency provided by ZFS itself.

That said, this isn't a typical Lustre configuration, but I think there would definitely be other interested parties if you tried this out and reported your results back here.

Cheers, Andreas
Andreas Dilger
Lustre Principal Architect
Intel High Performance Data Division

More information about the lustre-discuss mailing list