[Lustre-discuss] Reply High difference in I/O network traffic in lustre client
Lex
lexluthor87 at gmail.com
Tue Feb 2 01:11:28 PST 2010
did you measure the performance of this system before lustre?
> specifically
Tell me exactly what information useful for you to help me diagnose our
problem, plz
, your symptoms make it look like your disk system
> can't handle the load. since you have lots of small activity,
> the issue wouldn't be bandwidth, but latency. I've normally only
> seen this on the MDS, where metadata traffic can generate quite high
> numbers of transactions, even though the bandwidth is low.
>
> for instance, is the MDS volume a slow-write form of raid like raid5 or
> raid6? MDS activity is mainly small, synchronous transactions
> such as directory updates, which is why MDS should be on raid10.
>
We use raid10 for our MDS and it's operating quite idle. Below is some info
about load average and network traffic ( output from w and bmon command ) It
isn't too high to make the delay, right ?
*load average: 0.05, 0.10, 0.09
Name RX TX
──────────────────────────── ┬────────────────────────
MDS1 (local) │ Rate # % │ Rate
# %
0 lo │ 0 B 0 │ 0
B 0
1 eth0 │ 22 B 0 │
344.59KiB 736
2 eth1 │ 670.49KiB 1.37K │ 267.29KiB
592
3 bond0 │ 670.51KiB 1.38K │ 611.88KiB
1.30K*
> are quite a lot small file: a linux soft links ) Files are "striped" over
>>
>
> in a normal filesystem, symlinks are stored in the inode itself, at least
> for short symlink targets. I guess that applies to lustre as well - the
> symlink would be on the MDS. but there are issues related to the size of
> the inode on the MDS, since striping information is also stored in EAs
> which are also hopefully within the file's inode. when there's too much to
> fit into an inode, performance suffers, since the same metadata operations
> now require extra seeks.
>
I will consider this
> each 2 OSTs, some are striped over all our OSTs ( fewer than 2 OSTs
>> parallel
>> striping )
>>
>
> whether it makes sense to stripe over all OSTs or not depends on the sizes
> of your files. but since you have only gigabit, it's probably not a good
> idea. (that is, accessing a striped file won't be any faster, since it'll
> bottleneck on the client's network port.)
>
could you please tell me in detail the disadvantage of 1 Gig Ethernet in
using lustre and what exactly the bottleneck in client's network port is ? (
i tried to install more NIC for client and bonded it together but it didn't
help )
I found in some paper ( got it from google ) that if we using bonding
devices with 3 x 1 Gig Ethernet, the problem will be significantly improved.
But, in our case, i even couldn't reach the limit of 1 Gig !!!
>
> Do you have any idea for my issue ?
>>
>
> I think you need to find out whether the performance problem is merely
> due to latency (metadata rate) on the MDS. looking at normal performance
> metrics on the MDS when under load (/proc/partitions, etc) might be able
> to show this. even "vmstat 1" may be informative, to see what sorts of
> blocks-per-second IO rates you're getting.
>
>
Here is output of vmstat 1 in 10 seconds
*root at MDS1: ~ # vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
r b swpd free buff cache si so bi bo in cs us sy id
wa st
1 0 140 243968 3314424 432776 0 0 1 6 2 1 0 2 97
1 0
0 0 140 244092 3314424 432776 0 0 0 4 3037 6938 0 2 97
1 0
0 0 140 244092 3314424 432776 0 0 0 4 2980 6759 0 2 98
1 0
0 0 140 244216 3314424 432776 0 0 0 16 3574 8966 0 3 94
3 0
0 0 140 244092 3314424 432776 0 0 0 4 3511 8639 1 2 97
1 0
0 1 140 244092 3314424 432776 0 0 0 36 3549 8871 0 2 97
1 0
0 0 140 244092 3314424 432776 0 0 0 4 3085 7304 0 2 97
1 0
0 0 140 243968 3314424 432776 0 0 0 20 3199 7566 0 2 97
1 0
0 0 140 244092 3314424 432776 0 0 0 16 3294 7950 0 2 95
3 0
0 0 140 244092 3314424 432776 0 0 0 4 3336 8301 0 2 97
1 0*
and iostat -m 1 5
Linux 2.6.18-92.1.17.el5_lustre.1.8.0custom (MDS1) 02/02/2010
avg-cpu: %user %nice %system %iowait %steal %idle
0.17 0.02 1.53 1.33 0.00 96.96
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sda 3.66 0.00 0.02 12304 79721
drbd1 6.43 0.00 0.02 10709 70302
avg-cpu: %user %nice %system %iowait %steal %idle
0.75 0.00 2.24 0.75 0.00 96.26
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sda 1.00 0.00 0.00 0 0
drbd1 1.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 1.75 1.00 0.00 97.24
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sda 4.00 0.00 0.05 0 0
drbd1 1.00 0.00 0.00 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 2.00 3.50 0.00 94.50
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sda 3.00 0.00 0.02 0 0
drbd1 4.00 0.00 0.02 0 0
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 2.49 0.75 0.00 96.76
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sda 1.00 0.00 0.00 0 0
drbd1 1.00 0.00 0.00 0 0
I don't think our mds is too busy ( do correct me if i have a wrong comment
on our own situation, plz )
Do you have any ideas or comment
Many many thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100202/9accc80b/attachment.htm>
More information about the lustre-discuss
mailing list