[Lustre-discuss] Reliabilty improvement system for Lustre

Thu May 24 22:38:09 PDT 2012

On 2012-05-24, at 7:57 PM, 李希 wrote:
> MTFS is a kernel-space stackable file system, which looks like
> Ecryptfs, Aufs or Raif.
> 
> For example, we want to keep relica in /mnt/lustre/dir1 and
> /mnt/lustre/dir2. We do the following.
> First, set the stripes of these directories in order to place their
> objects into different OSTs. We can use OST pools to achieve this
> goal.
> Second, mount MTFS upon these directories. mount -t mtfs
> /mnt/lustre/dir1:/mnt/lustre/dir2 /mnt/mtfs
> Third, use /mnt/mtfs as a normal file system.
> MTFS will atomatically write replicas into these two branches. So even
> if OSSs, OSTs or networks in single OST poll fail, we can still access
> data from /mnt/mtfs normally.

Interesting.

We have been discussing something similar built into Lustre itself for some time, but getting the behaviour correct in the face of system failures (which is when it is most useful) is much more difficult than getting correct behaviour during normal system operation.

If the client crashes while writing to both copies of a file, how does MTFS determine which file has the correct data in it?  Does it do synchronous IO to one or both files?

What was your motivation for layering this on top of Lustre instead of creating a new RAID1 LOV object layout inside Lustre to do the same thing?

Cheers, Andreas

> 2012/5/25, Andreas Dilger <adilger at whamcloud.com>:
>> On 2012-05-24, at 8:34 AM, 李希 wrote:
>>>     I am happy to announce the release of MTFS. It is an open source
>>> project whose aim is to improve the reliablity of lustre and other file
>>> systems. MTFS is a stackable file system which uses lower file system's
>>> directories as its branches, and automatically generate multiple identical
>>> replica of files or directories, while everything being kept transparent
>>> to users.
>>>    It can be downloaded from multifs.com/mtfs_newest.tar.gz or
>>> http://code.google.com/p/mtfs/downloads/list. Any suggestion or question
>>> will be welcome.
>> 
>> This looks interesting - pretty similar to GlusterFS I guess?
>> 
>> Short of reading the code, is there any description of how this works?
>> Is it a user-space FUSE driver, a kernel module, something else?
>> How does it achieve increased reliability?  How does it deal with faults?

Cheers, Andreas
--
Andreas Dilger                       Whamcloud, Inc.
Principal Lustre Engineer            http://www.whamcloud.com/