[Lustre-discuss] lustre and software RAID

Sam Aparicio saparicio at bccrc.ca
Sun Jan 23 17:38:06 PST 2011

yes, mdadm first then format.

I have now managed to get the mkfs to produce a mountable filesystem on this server, however attempts to format a filesystem with an external journal (-J device=/dev/sdc, /dev/sdc previously created/formatted as a journal filesystem) seem to fail (silently - mkfs completes without errors). However, formatting as below (no -J) and using tune2fs to modify and make the journal external seem to work fine.  I am not quite sure what was happening but the process seems to be working now, the filesystem mounts, can be written to, can be modified with tune2fs etc. I am looking into this some more. I suspect the journal device may have been corrupted before the OST was formatted. Is it possible that formatting an OST with an external journal would fail silently if the journal was not present, or corrupted?

interestingly, and the reason I was experimenting with this test server, was to look at whether software raid 10 on the OSS would do a better job, as opposed to having the external disk enclosure present them as a RAID10 LUN (our disk enclosures have this capability on board).  Some basic write testing suggests the software raid10 on the OSS server is doing a better job (maybe 15-20% better) with sustained write throughput (beyond any possible buffering in RAM), than having the same disks exported as raid10 LUN.

thanks for your input on this.


From: Andreas Dilger [adilger at whamcloud.com]
Sent: Saturday, January 22, 2011 9:26 PM
To: Sam Aparicio
Cc: Eudes PHILIPPE; lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] lustre and software RAID

Presumably, unlike the order shown below, you run the mkfs.lustre AFTER the mdadm command?

Cheers, Andreas

On 2011-01-21, at 14:55, Samuel Aparicio <saparicio at bccrc.ca<mailto:saparicio at bccrc.ca>> wrote:


mkfs.lustre --ost --fsname=lustre --reformat --mgsnode= at tcp0 /dev/md2

mdadm -v --create /dev/md2 --chunk=256 --level=raid10 --raid-devices=16 --spare-devices=1 --assume-clean --layout=n2 /dev/etherd/e5.9 /dev/etherd/e5.10 /dev/etherd/e5.11 /dev/etherd/e5.12 /dev/etherd/e5.13 /dev/etherd/e5.14 /dev/etherd/e5.15 /dev/etherd/e5.16 /dev/etherd/e5.17 /dev/etherd/e5.18 /dev/etherd/e5.19 /dev/etherd/e5.20 /dev/etherd/e5.21 /dev/etherd/e5.22 /dev/etherd/e5.23 /dev/etherd/e5.7 /dev/etherd/e5.8

cat /proc/mdstat

md2 : active raid10 etherd/e5.8[16](S) etherd/e5.7[15] etherd/e5.23[14] etherd/e5.22[13] etherd/e5.21[12] etherd/e5.20[11] etherd/e5.19[10] etherd/e5.18[9] etherd/e5.17[8] etherd/e5.16[7] etherd/e5.15[6] etherd/e5.14[5] etherd/e5.13[4] etherd/e5.12[3] etherd/e5.11[2] etherd/e5.10[1] etherd/e5.9[0]
      15628113920 blocks 256K chunks 2 near-copies [16/16] [UUUUUUUUUUUUUUUU]

Professor Samuel Aparicio BM BCh PhD FRCPath
Nan and Lorraine Robertson Chair UBC/BC Cancer Agency
675 West 10th, Vancouver V5Z 1L3, Canada.
office: +1 604 675 8200 cellphone: +1 604 762 5178: lab website <http://molonc.bccrc.ca/> http://molonc.bccrc.ca

On Jan 21, 2011, at 1:01 PM, Eudes PHILIPPE wrote:

I’m not an expert on lustre, just begin with it ☺ but…

What is your version of e2fsprogs?

What is your command line to format your raid?


De : <mailto:lustre-discuss-bounces at lists.lustre.org> lustre-discuss-bounces at lists.lustre.org<mailto:lustre-discuss-bounces at lists.lustre.org> [mailto:lustre-discuss-bounces at lists.lustre.org] De la part de Samuel Aparicio
Envoyé : vendredi 21 janvier 2011 21:37
À : <mailto:lustre-discuss at lists.lustre.org> lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
Objet : [Lustre-discuss] lustre and software RAID

I am having the following issue:

trying to create an ext4 lustre filesystem attached to an OSS.
the disks being used are exported from an external disk enclosure.
i create a raid10 set with mdadm from 16 2Tb disks, this part seems fine.
I am able to format such an array with normal ext4, mount a filesytem etc.
however when i try the same thing, trying to format for a lustre filesystem I am unable to mount the filesystem and lustre does not seem to detect it.
the lustre format completes normally, without errors.

If I arrange to present the disks as a RAID10 set from the external disk enclosure, which has it's own internal RAID capability,
(rather than trying to use mdadm on the OSS) the lustre formatting works fine and I can get a mountable OST.

the kernel log reports the following when a mount is attempted:

LDISKFS-fs (md2): VFS: Can't find ldiskfs filesystem
LustreError: 15241:0:(obd_mount.c:1292:server_kernel_mount()) premount /dev/md2:0x0 ldiskfs failed: -22, ldiskfs2 failed: -19.  Is the ldiskfs module available?
LustreError: 15241:0:(obd_mount.c:1618:server_fill_super()) Unable to mount device /dev/md2: -22
LustreError: 15241:0:(obd_mount.c:2050:lustre_fill_super()) Unable to mount  (-22)

lsmod reports that all the modules are loaded

fsck reports the following
fsck 1.41.10.sun2 (24-Feb-2010)
e2fsck 1.41.10.sun2 (24-Feb-2010)
fsck.ext2: Superblock invalid, trying backup blocks...
fsck.ext2: Bad magic number in super-block while trying to open /dev//md2

It would seem the filesystem has not been written properly, but mkfs reports no errors ....

lustre version 1.8.4
kernel 2.6.18-194.3.1.el5_lustre.1.8.4
disk array is a coraid SATA/AOE device which has worked fine in every other context

this seems like an interaction of lustre with software RAID on the OSS?
I wonder if anyone has seen anything like this before.
any ideas about this?

Professor Samuel Aparicio BM BCh PhD FRCPath
Nan and Lorraine Robertson Chair UBC/BC Cancer Agency
675 West 10th, Vancouver V5Z 1L3, Canada.
office: +1 604 675 8200 cellphone: +1 604 762 5178: lab website <http://molonc.bccrc.ca/> http://molonc.bccrc.ca

Lustre-discuss mailing list
<mailto:Lustre-discuss at lists.lustre.org>Lustre-discuss at lists.lustre.org<mailto:Lustre-discuss at lists.lustre.org>

Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org<mailto:Lustre-discuss at lists.lustre.org>

More information about the lustre-discuss mailing list