[Lustre-discuss] HW experience

Wed Mar 26 18:31:20 PDT 2008

On Wed, 26 Mar 2008, Martin Gasthuber wrote:
Hi,

     I have a lustre setup with two OSS and 8 MD1000 connected to two 
perc5e controllers with ~90 TB of raw storage. The 6 mds are on a 
different perc5i controller. It uses bonded gigabit ethernet. Performance 
is not bad. I have another lustre setup with a single oss and a 3ware 
9650. The performance is better than that setup. If the new perc6 is 
based on lsi 8888, since it is supports raid 6 (i could be wrong on this 
assumption), then the performance of perc 5 and 6 can't be directly 
comparable as one is an intel iop based controller and the 8888 is a
power pc based one. Some benchmarks on the internet list the 8888 as a 
very good controller with one of the best performance available. So if 
perc6 is based on 8888, it could be likely better than perc 5. 

         If there is only one md1000, then you should look at the 
performance of split mode with one lun per raid5 (better raid6 for peace 
of mind) with one hot spare (in case of raid6). Perc 6 was my first choice 
over perc 5 just to get raid 6, but it was not out yet when the equipment 
was purchased. The md1000's that i have here have seagate barracuda es 
drives which are supposed to be enterprise hard drives. But sata drives do 
have a high failure rate. I have 6 failed in the last 4-5 months (out of 
120 in total), which is not that high especially when the arrays are almost 
full. Perc 5 does have some weird characteristics, for example a month ago 
a drive in one MD1000 enclosure started showing "unexpected sense" errors.
Unfortunately it started happening twice or thrice per second and the 
monitoring deamon started sending email for every instance as i configured 
the alert level too high. Finally when the drive failed, one more drive 
fell off another volume for no apparant reason. Dell is also unable to 
explain that. I was fortunate that the drive didn't fall off the same raid 
5 volume as the actual failing drive. The fell off drive is still a good 
drive and is now in the global hotspare pool. It was good on hindsight 
that i didn't go for a single big raid5 volume with 13 or 14 drives  and 
went for two raid 5 volumes  per enclosure ( 7 + 6 plus two hot spares) 
presisely for this scenario. This reduces risk of multiple simultaneous 
drive failures in the same volume. But as i mentioned before, if perc6 is 
based on lsi 8888, then it is a totally new product not directly 
comparable to perc 5. 

       Also watch out for some critical Lustre bugs which are 
showstoppers. Like this - 
https://bugzilla.lustre.org/show_bug.cgi?id=13438  Until this patch came 
out, the two OSS crashed on a daily basis for two months. There is also 
another bug that affects nfs exports that needs a reboot of the lustre client that 
does the nfs export. I went around it for now by using a virtual machine 
for nfs export of lustre volumes so that a reboot won't affect running 
compute jobs. There is also another problem as explained in an email in 
the list a few days ago with clients getting evicted. So i was 
concentrating much more on MD1000's in the beginning, but in the end i was more than happy when i 
got a working stable lustre configuration and now not too keen to extract 
the last ounce of performance out of the MD1000 anymore :-) 

Regards
Balagopal

> Hi,
> 
>   we would like to establish a small Lustre instance and for the OST
> planning to use standard Dell PE1950 servers (2x QuadCore + 16 GB Ram) and
> for the disk a JBOD (MD1000) steered by the PE1950 internal Raid controller
> (Raid-6). Any experience (good or bad) with such a config ?
> 
> thanxs,
>    Martin
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>