[Lustre-discuss] Harddisk Allocation

Dilger, Andreas andreas.dilger at intel.com
Fri Jun 14 17:39:07 PDT 2013





On 2013-06-14, at 17:54, "Chan Ching Yu, Patrick" <cychan at clustertech.com<mailto:cychan at clustertech.com>> wrote:
I am considering the harddisk allocation for the Lustre storage system.
There are totally two Lustre IO servers, one acts as MDS/OSS,  another acts as a pure OSS.

Note that if the MDS and OSS share the same node that if this node fails the OSTs there will not be able to complete recovery.

Both IO servers connect to a MD3200, which is daisy-chained by 4 MD1200.
Each MD storage system is equipped with 12 600GB harddisks.
I use ASCII-art to illustrate the storage system as follows:
(I also have the jpeg, but I dunno if jpeg is allowed in this mailling list, tell me if you can’t see the text-formated picture below)

MDS/OSS       OSS
       |                |
________________
|                             |    1    4    7   10
|        MD3200       |    2    5    8   11
|______________ |    3    6    9   12
________________
|                             |   1   4   7   10
|        MD1200       |   2   5   8   11
|______________ |   3   6   9   12
________________
|                             |   1   4   7   10
|        MD1200       |   2   5   8   11
|______________ |   3   6   9   12
________________
|                             |   1   4   7   10
|        MD1200       |   2   5   8   11
|______________ |  3   6    9   12
________________
|                             |   1  4   7   10
|        MD1200       |   2  5   8   11
|______________ |  3   6   9   12

This is my plan of harddisk allocation:

Harddisk 1 and 7 of MD3200 form a RAID-1 disk group,
this disk group has multiple virtual disks, one of them is the MGS, others are MDT1, MDT2, MDT3....etc.

It is fine to share the MGS and one MDT, but it would be better to create RAID-1 groups on the #1 and #7 disks for each of the MDTs.

Harddisk 2,3,4,5,6 of MD3200/MD1200 form a RAID-5 disk group,
this disk group only has one virtual disk, the virtual disk is the OSS

Harddisk 8,9,10,11,12 of MD3200/MD1200 form a RAID-5 disk group,
this disk group only has one virtual disk, the virtual disk is the OSS

It might make more sense to use the same disk from each of the MD enclosures so that if an enclosure fails (except the MD3200) you only lose one disk from each RAID-5 group.

Hardisk 1 and 7 of all MD1200 are unused (or make them as hot spare)

Following both comments above, it might be better to use e.g. #1 from each of two different MD enclosures to make a single RAID-1.

The caveat is if

Each OSS has 4 effective harddisks, segment size 256KB, so that the stripe size is 1MB

You could also use 128k segment size and have a 1MB write cover two chunks. This has the advantage that 512kB writes can also be done w/o read-modify-write.

It is much more common to use 8+2 in RAID-6 configuration with Lustre and have 128kB segments. That allows the same 25% parity overhead but has more redundancy (at some cost in performance).

Cheers, Andreas

The local harddisk of each IO server is very small, so I don’t intend to use them as MGS/MDT.


My plan above is considered based on  (not in prioritized order):
a) Easy to remember
b) Performance
c) Utilize the available harddisks as much as possible

Do you have any better suggestion?

Thanks.
CY
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org<mailto:Lustre-discuss at lists.lustre.org>
http://lists.lustre.org/mailman/listinfo/lustre-discuss



More information about the lustre-discuss mailing list