[Lustre-discuss] Problems with failover

Aaron Knister aaron at iges.org
Fri Jan 4 12:35:35 PST 2008


Personally I strongly advise against using compute nodes to host any  
type of storage service. If a user job crashes  a compute node (it  
will actually usually take out several) in which case you're once  
again up a creek. I don't know of any filesystem that could handle the  
failure of more than two or three underlying storage components.  
Separating storage from computation was the best decision I've ever  
made because it allows both to be scaled independently. Am I totally  
missing the mark here? If you still want to do this, try the gfarm  
filesystem and there's another one but I can't think of the name. If i  
find it I'll let you know.

On Jan 4, 2008, at 11:10 AM, Jeremy Mann wrote:

>
> On Thu, 2008-01-03 at 17:34 -0700, Andreas Dilger wrote:
>
>> To be clear - Lustre failover has nothing to do with data  
>> replication.
>> It is meant only as a mechanism to allow high-availability of shared
>> disk.  This means - more than one node can serve shared disk from a
>> SAN or multi-port FC/SCSI disks.
>
> How would one build a reliable system with 20 OSTs? Our system  
> contains
> 20 compute nodes, each with 2 200GB drives in a RAID0 configuration.
> Each node acts as an OST and a failover of each other, i.e. 0-1, 1-2,
> 3-4, etc..
>
> I can start from scratch, so I'm thinking of rebuilding the RAID  
> arrays
> with RAID1 to compensate for disk failures. But that still leaves me
> questioning if a node goes down, or we lose another drive, if we'll be
> back to the same problems we've been having.
>
> -- 
> Jeremy Mann
> jeremy at biochem.uthscsa.edu
>
> University of Texas Health Science Center
> Bioinformatics Core Facility
> http://www.bioinformatics.uthscsa.edu
> Phone: 210-567-2672
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Aaron Knister
Associate Systems Analyst
Center for Ocean-Land-Atmosphere Studies

(301) 595-7000
aaron at iges.org







More information about the lustre-discuss mailing list