[Lustre-discuss] Plateau around 200MiB/s bond0
Jeffrey Alan Bennett
jab at sdsc.edu
Wed Jan 28 12:30:05 PST 2009
Hi Arden,
Are you obtaining more than 100 MB/sec from one client to one OST? Given that you are using 802.3ad link aggregation, it will determine the physical NIC by the other party's MAC address. So having multiple OST and multiple clients will improve the chances of using more than one NIC of the bonding.
What is the maximum performance you obtain on the client with two 1GbE?
jeff
________________________________
From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Arden Wiebe
Sent: Sunday, January 25, 2009 12:08 AM
To: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] Plateau around 200MiB/s bond0
So if one OST gets 200MiB/s and another OST gets 200MiB/s does that make 400 MiB/s or this is not how to calculate throughput? I will eventually plug the right sequence into iozone to measure it.
>From my perspective it looks like ioio.ca/ioio.jpg ioio.ca/lustreone.png ioio.ca/lustretwo.png ioio.ca/lustrethree.png ioio.ca/lustrefour.png
--- On Sat, 1/24/09, Arden Wiebe <albert682 at yahoo.com> wrote:
From: Arden Wiebe <albert682 at yahoo.com>
Subject: [Lustre-discuss] Plateau around 200MiB/s bond0
To: lustre-discuss at lists.lustre.org
Date: Saturday, January 24, 2009, 6:04 PM
1-2948-SFP Plus Baseline 3Com Switch
1-MGS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1
1-MDT bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1
2-OSS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid6
1-MGS-CLIENT bond0(eth0,eth1,eth2,eth3,eth4,eth5)
1-CLIENT bond0(eth0,eth1)
1-CLIENT eth0
1-CLIENT eth0
I fail so far creating external journal for MDT, MGS and OSSx2. How to add the external journal to /etc/fstab specifically the output of e2label /dev/sdb followed by what options for fstab?
[root at lustreone ~]# cat /proc/fs/lustre/devices
0 UP mgs MGS MGS 17
1 UP mgc MGC192.168.0.7 at tcp 876c20af-aaec-1da0-5486-1fc61ec8cd15 5
2 UP lov ioio-clilov-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 4
3 UP mdc ioio-MDT0000-mdc-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 5
4 UP osc ioio-OST0000-osc-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 5
5 UP osc ioio-OST0001-osc-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 5
[root at lustreone ~]# lfs df -h
UUID bytes Used Available Use% Mounted on
ioio-MDT0000_UUID 815.0G 534.0M 767.9G 0% /mnt/ioio[MDT:0]
ioio-OST0000_UUID 3.6T 28.4G 3.4T 0% /mnt/ioio[OST:0]
ioio-OST0001_UUID 3.6T 18.0G 3.4T 0% /mnt/ioio[OST:1]
filesystem summary: 7.2T 46.4G 6.8T 0% /mnt/ioio
[root at lustreone ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: slow
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 1
Actor Key: 17
Partner Key: 1
Partner Mac Address: 00:00:00:00:00:00
Slave Interface: eth0
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1b:21:28:77:db
Aggregator ID: 1
Slave Interface: eth1
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1b:21:28:77:6c
Aggregator ID: 2
Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:94
Aggregator ID: 3
Slave Interface: eth2
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:93
Aggregator ID: 4
Slave Interface: eth4
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:95
Aggregator ID: 5
Slave Interface: eth5
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:96
Aggregator ID: 6
[root at lustreone ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb[0] sdc[1]
976762496 blocks [2/2] [UU]
unused devices: <none>
[root at lustreone ~]# cat /etc/fstab
LABEL=/ / ext3 defaults 1 1
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
LABEL=MGS /mnt/mgs lustre defaults,_netdev 0 0
192.168.0.7 at tcp0:/ioio /mnt/ioio lustre defaults,_netdev,noauto 0 0
[root at lustreone ~]# ifconfig
bond0 Link encap:Ethernet HWaddr 00:1B:21:28:77:DB
inet addr:192.168.0.7 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:9000 Metric:1
RX packets:5457486 errors:0 dropped:0 overruns:0 frame:0
TX packets:4665580 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:12376680079 (11.5 GiB) TX bytes:34438742885 (32.0 GiB)
eth0 Link encap:Ethernet HWaddr 00:1B:21:28:77:DB
inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link
UP BROADCAST RUNNING SLAVE MULTICAST MTU:9000 Metric:1
RX packets:3808615 errors:0 dropped:0 overruns:0 frame:0
TX packets:4664270 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:12290700380 (11.4 GiB) TX bytes:34438581771 (32.0 GiB)
Base address:0xec00 Memory:febe0000-fec00000
>From what I have read not having an external journal configured for the OST's is a sure recipie for slowness which I would rather not have considering the goal is around 350MiB/s or more which should be obtainable.
Here is how I formated the raid6 device on both OSS's that have identical
[root at lustrefour ~]# fdisk -l
Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 1 121601 976760001 83 Linux
Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdb doesn't contain a valid partition table
Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdc doesn't contain a valid partition table
Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdd doesn't contain a valid partition table
Disk /dev/sde: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sde doesn't contain a valid partition table
Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdf doesn't contain a valid partition table
Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdg doesn't contain a valid partition table
Disk /dev/sdh: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdh doesn't contain a valid partition table
Disk /dev/md0: 4000.8 GB, 4000819183616 bytes
2 heads, 4 sectors/track, 976762496 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk /dev/md0 doesn't contain a valid partition table
[root at lustrefour ~]#
[root at lustrefour ~]# mdadm --create --assume-clean /dev/md0 --level=6 --chunk=128 --raid-devices=6 /dev/sd[cdefgh]
[root at lustrefour ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdc[0] sdh[5] sdg[4] sdf[3] sde[2] sdd[1]
3907049984 blocks level 6, 128k chunk, algorithm 2 [6/6] [UUUUUU]
in: 16674 reads, 16217479 writes; out: 3022788 reads, 32865192 writes
7712698 in raid5d, 8264 out of stripes, 25661224 handle called
reads: 0 for rmw, 1710975 for rcw. zcopy writes: 4864584, copied writes: 16115932
0 delayed, 0 bit delayed, 0 active, queues: 0 in, 0 out
0 expanding overlap
unused devices: <none>
Followed with:
[root at lustrefour ~]# mkfs.lustre --ost --fsname=ioio --mgsnode=192.168.0.7 at tcp0 --mkfsoptions="-J device=/dev/sdb1" --reformat /dev/md0
[root at lustrefour ~]# mke2fs -b 4096 -O journal_dev /dev/sdb1
But that is hard to reassemble on the reboot or at least was before I use e2label and label things right. Question how to label the external journal in fstab if at all? Right now only running
[root at lustrefour ~]# mkfs.lustre --fsname=ioio --ost --mgsnode=192.168.0.7 at tcp0 --reformat /dev/md0
So just raid6 no external journal.
[root at lustrefour ~]# cat /etc/fstab
LABEL=/ / ext3 defaults 1 1
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
LABEL=ioio-OST0001 /mnt/ost00 lustre defaults,_netdev 0 0
192.168.0.7 at tcp0:/ioio /mnt/ioio lustre defaults,_netdev,noauto 0 0
[root at lustrefour ~]#
[root at lustreone bin]# ./ost-survey -s 4096 /mnt/ioio
./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168.0.7 at tcp
Number of Active OST devices : 2
Worst Read OST indx: 0 speed: 38.789337
Best Read OST indx: 1 speed: 40.017201
Read Average: 39.403269 +/- 0.613932 MB/s
Worst Write OST indx: 0 speed: 49.227064
Best Write OST indx: 1 speed: 78.673564
Write Average: 63.950314 +/- 14.723250 MB/s
Ost# Read(MB/s) Write(MB/s) Read-time Write-time
----------------------------------------------------
0 38.789 49.227 105.596 83.206
1 40.017 78.674 102.356 52.063
[root at lustreone bin]# ./ost-survey -s 1024 /mnt/ioio
./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168.0.7 at tcp
Number of Active OST devices : 2
Worst Read OST indx: 0 speed: 38.559620
Best Read OST indx: 1 speed: 40.053787
Read Average: 39.306704 +/- 0.747083 MB/s
Worst Write OST indx: 0 speed: 71.623744
Best Write OST indx: 1 speed: 82.764897
Write Average: 77.194320 +/- 5.570577 MB/s
Ost# Read(MB/s) Write(MB/s) Read-time Write-time
----------------------------------------------------
0 38.560 71.624 26.556 14.297
1 40.054 82.765 25.566 12.372
[root at lustreone bin]# dd of=/mnt/ioio/bigfileMGS if=/dev/zero bs=1048576
3536+0 records in
3536+0 records out
3707764736 bytes (3.7 GB) copied, 38.4775 seconds, 96.4 MB/s
lustreonetwothreefour all have the same for modprobe.conf
[root at lustrefour ~]# cat /etc/modprobe.conf
alias eth0 e1000
alias eth1 e1000
alias scsi_hostadapter pata_marvell
alias scsi_hostadapter1 ata_piix
options lnet networks=tcp
alias eth2 sky2
alias eth3 sky2
alias eth4 sky2
alias eth5 sky2
alias bond0 bonding
options bonding miimon=100 mode=4
[root at lustrefour ~]#
When do the same from all clients I can watch ./usr/bin/gnome-system-monitor and the send and recieve from the various nodes reaches a 209 MiB/s plateau? Uggh
-----Inline Attachment Follows-----
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org</mc/compose?to=Lustre-discuss at lists.lustre.org>
http://lists.lustre.org/mailman/listinfo/lustre-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090128/de86b0a3/attachment.htm>
More information about the lustre-discuss
mailing list