[Lustre-discuss] external journal raid1 vs. single disk ext journal + hot spare on raid6

Stuart Marshall stuart.l.marshall at gmail.com
Thu May 14 13:08:36 PDT 2009


Hi All,

With the upgrade from 1.6.x to 1.8.x we are planning to reconfigure our RAID
systems.

The OST RAID hardware are Sun 6140 arrays with 16x500GB SATA disks.  Each
6140 tray has one OSS node (Sun X2200 M2).  We have redundant paths and
ultimately plan a failover strategy.  The MDT will be a RAID 1+0 Sun 2540
with 12x73GB SAS disks.

Each 6140 tray will be configured either as 1 or 2 RAID6 volumes.  The
lustre manual recommends more smaller OST's over large and other docs I've
seen seem to indicate that the optimal number of drives is ~(6+2).  For
these 16 disk trays, the choice would be one (12+2R6) + external journal
and/or hot spares or two (5+2R6)'s + ext. jrnl and/or hot spares.

So my questions are:

1.) What are the trade-offs of RAID1 external journal with no hot spare vs.
single disk ext journal with a hot spare (spare is for R6 volume)?
Specifically:

- If a single disk external journal is lost, can we run fsck and only lose
the transactions that have not been committed to disk?  If so, then the loss
of the disk hosting the external journal would not be catastrophic for the
file system as a whole.

- How comfortable are RAID6 users with no hot spares? (We'll have cold
spares handy, but prefer to get through weekends w/out service)

2.) The external journal only takes up ~400MB.  If we create 2 RAID6
volumes, can we put 2 external journals on one disk or RAID1 set (suitably
partitioned), or do we need to blow an entire disk for one external journal?

3.) In planning for "segment size" (chunk size in lustre manual) we'd have
to go to 128kB or lower.  However, in single disk tests (SATA), it seems
that larger is better so perhaps this argues for small RAID6 sets as
mentioned in the manual.  Just wondering what other folks have found here
also.

We have the opportunity to test several scenarios with 2 6140 trays that are
not part of the 1.6.x production system so I expect we will test performance
as a function of the number of drives in the RAID6 volume (eg. 12+2 vs 5+2)
along with array write segment sizes via sgpdd-survey.

I'll report back with test results once we sort out which knobs seem to make
the most difference.

Any advice or comments welcome,
Stuart
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090514/e5326289/attachment.htm>


More information about the lustre-discuss mailing list