[Lustre-discuss] Plateau around 200MiB/s bond0

Arden Wiebe albert682 at yahoo.com
Tue Jan 27 23:38:10 PST 2009

--- On Mon, 1/26/09, Brian J. Murrell <Brian.Murrell at Sun.COM> wrote:

From: Brian J. Murrell <Brian.Murrell at Sun.COM>
Subject: Re: [Lustre-discuss] Plateau around 200MiB/s bond0
To: lustre-discuss at lists.lustre.org
Date: Monday, January 26, 2009, 6:59 AM

In general, when writing messages to this list, you need to be more
concise about what you are asking.  I see so much information here, I'm
not sure what is relevant to your few interspersed questions and what is
not.  I will try to answer your specific question...

My apologies for posting my study hacks to the list.  Thanks Brian for at least trying to answer questions that I have to learn the answer for myself first before I know the correct question to ask.  

Also, in the future, please use a simple plain-text format and just copy
and paste for plain-text content.  All of the "quoted-printable"
mime-types are confusing my MUA.

No doubt.  Sorry, I'm not good with MTA or MUA in general but I'll switch to plain text in the future.

On Sat, 2009-01-24 at 18:04 -0800, Arden Wiebe wrote:
> I fail so far creating external journal for MDT, MGS and OSSx2.  How
> to add the external journal to /etc/fstab specifically the output of
> e2label /dev/sdb followed by what options for fstab?

You need to look at the mkfs.ext3 manpage on how to create an external
journal (i.e. -O journal_dev external-journal) and attach an external
journal to an ext3 filesystem (i.e. -J device=external-journal) then
apply those mkfs.ext3 options to your Lustre device with mkfs.lustre's
--mkfsoptions option.

All of this is covered in the operations manual in section 10.3
"Creating an External Journal".

Been there done that well sort of.  Managed to have every luster filesystem with external journals some even on different controllers.  Underlying root/boot presentation separates the raid from the MBR and root and boot partitions that are un-raided and could eventually be done with a USB memory stick to afford a hot spare implementation from the released /dev/sda.  

The goal so far as the root file system is eventually a network/cluster configuration tool so that root/boot partitions can be delivered over the cluster to new and old nodes.  Until then the DVD.iso method works fine and can rehabilitate a failed boot drive in the standard CentOS 5.2 install time.  

The manual or list said without quoting in numerous places no partitions.  There are no partitions in this configuration save for a 1TB / partition on /dev/sda1 of all main nodes and external journals on /dev/sdf1 on the MDT and MGS and /dev/sdb1 on the two OST that all occupy ,50,L of the entire 1TB drive for no doubt the 400mb journal. 

Solution at the time was to learn proper syntax for creation of raid10 device.  So instead of physically making two raid 1 arrays and one raid 0 array to make a raid 1+0 configuration I had to learn the right way to make a raid10 - ya believe it.  e2label was reporting MGS for two drive volumes and fstab was all borked.

To top it all off I was dealing with a network anomaly that still persists on my MGS node whereupon I can't run the node at MTU 9000 while the rest of the nodes that are set can.  Even removed pulled the box off the shelf checked for hardware faults, reseated cards.  Removed all network interfaces and started over.  Still persists due to mixing of MTU 1500 and MTU 9000 on the same subnet no doubt.

Not sure if this is a proper list deliverable  but I have produced a series of pictures that in my understanding show a small lustre ethernet cluster running on comodity hardware doing 400MiB/s on one OST but also one that needs to handle smaller files better.  http://www.ioio.ca/Lustre-tcp-bonding/images.html and http://www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html 

Typical usage so far shows that copying /var/lib/mysql is still a time consuming process given 4.9G of data.  Web based files in flight are also typical small file size. Further objectives for the cluster are not implemented at this time but would include more of the same and then some.
Further suggestions regarding implementation of network specific cluster enhancements, partitioning, formatting, benchmarking or modes appreciated. 

My apologies for the --verbose thread that I hope is better formatted to fit your screen and also for my lack of specific questions due to not having enough experience to know the correct ones to ask at times.



-----Inline Attachment Follows-----

Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org


More information about the lustre-discuss mailing list