[Lustre-discuss] Reply High difference in I/O network traffic in lustre client
Atul Vidwansa
Atul.Vidwansa at Sun.COM
Wed Feb 3 02:35:15 PST 2010
Hi Lex,
If you are using Lustre for small files, another blog post of mine would
be helpful. Have a look at
http://blogs.sun.com/atulvid/entry/improving_performance_of_small_files
The post includes some tips on improving performance of small files and
also has recommendations on tools to use for benchmarking.
Cheers,
_Atul
Lex wrote:
>
>
> did you measure the performance of this system before lustre?
> specifically
>
>
> Tell me exactly what information useful for you to help me diagnose
> our problem, plz
>
> , your symptoms make it look like your disk system
> can't handle the load. since you have lots of small activity,
> the issue wouldn't be bandwidth, but latency. I've normally only
> seen this on the MDS, where metadata traffic can generate quite
> high numbers of transactions, even though the bandwidth is low.
>
>
>
> for instance, is the MDS volume a slow-write form of raid like
> raid5 or raid6? MDS activity is mainly small, synchronous
> transactions
> such as directory updates, which is why MDS should be on raid10.
>
>
> We use raid10 for our MDS and it's operating quite idle. Below is some
> info about load average and network traffic ( output from w and bmon
> command ) It isn't too high to make the delay, right ?
>
> /load average: 0.05, 0.10, 0.09
>
> Name RX TX
> ──────────────────────────── ┬────────────────────────
> MDS1 (local) │ Rate # % │
> Rate # %
> 0 lo │ 0 B 0
> │ 0 B 0
> 1 eth0 │ 22 B 0 │
> 344.59KiB 736
> 2 eth1 │ 670.49KiB 1.37K │
> 267.29KiB 592
> 3 bond0 │ 670.51KiB 1.38K │
> 611.88KiB 1.30K/
>
>
> are quite a lot small file: a linux soft links ) Files are
> "striped" over
>
>
> in a normal filesystem, symlinks are stored in the inode itself,
> at least for short symlink targets. I guess that applies to
> lustre as well - the symlink would be on the MDS. but there are
> issues related to the size of the inode on the MDS, since striping
> information is also stored in EAs
> which are also hopefully within the file's inode. when there's
> too much to
> fit into an inode, performance suffers, since the same metadata
> operations
> now require extra seeks.
>
>
> I will consider this
>
>
> each 2 OSTs, some are striped over all our OSTs ( fewer than 2
> OSTs parallel
> striping )
>
>
> whether it makes sense to stripe over all OSTs or not depends on
> the sizes of your files. but since you have only gigabit, it's
> probably not a good idea. (that is, accessing a striped file
> won't be any faster, since it'll bottleneck on the client's
> network port.)
>
>
> could you please tell me in detail the disadvantage of 1 Gig Ethernet
> in using lustre and what exactly the bottleneck in client's network
> port is ? ( i tried to install more NIC for client and bonded it
> together but it didn't help )
>
> I found in some paper ( got it from google ) that if we using bonding
> devices with 3 x 1 Gig Ethernet, the problem will be significantly
> improved. But, in our case, i even couldn't reach the limit of 1 Gig !!!
>
>
>
> Do you have any idea for my issue ?
>
>
> I think you need to find out whether the performance problem is merely
> due to latency (metadata rate) on the MDS. looking at normal
> performance
> metrics on the MDS when under load (/proc/partitions, etc) might
> be able
> to show this. even "vmstat 1" may be informative, to see what
> sorts of blocks-per-second IO rates you're getting.
>
>
> Here is output of vmstat 1 in 10 seconds
>
> /root at MDS1: ~ # vmstat 1
> procs -----------memory---------- ---swap-- -----io---- --system--
> -----cpu------
> r b swpd free buff cache si so bi bo in cs us
> sy id wa st
> 1 0 140 243968 3314424 432776 0 0 1 6 2 1 0
> 2 97 1 0
> 0 0 140 244092 3314424 432776 0 0 0 4 3037 6938 0
> 2 97 1 0
> 0 0 140 244092 3314424 432776 0 0 0 4 2980 6759 0
> 2 98 1 0
> 0 0 140 244216 3314424 432776 0 0 0 16 3574 8966 0
> 3 94 3 0
> 0 0 140 244092 3314424 432776 0 0 0 4 3511 8639 1
> 2 97 1 0
> 0 1 140 244092 3314424 432776 0 0 0 36 3549 8871 0
> 2 97 1 0
> 0 0 140 244092 3314424 432776 0 0 0 4 3085 7304 0
> 2 97 1 0
> 0 0 140 243968 3314424 432776 0 0 0 20 3199 7566 0
> 2 97 1 0
> 0 0 140 244092 3314424 432776 0 0 0 16 3294 7950 0
> 2 95 3 0
> 0 0 140 244092 3314424 432776 0 0 0 4 3336 8301 0
> 2 97 1 0/
>
> and iostat -m 1 5
>
> Linux 2.6.18-92.1.17.el5_lustre.1.8.0custom (MDS1) 02/02/2010
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.17 0.02 1.53 1.33 0.00 96.96
>
> Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
> sda 3.66 0.00 0.02 12304 79721
> drbd1 6.43 0.00 0.02 10709 70302
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.75 0.00 2.24 0.75 0.00 96.26
>
> Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
> sda 1.00 0.00 0.00 0 0
> drbd1 1.00 0.00 0.00 0 0
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.00 0.00 1.75 1.00 0.00 97.24
>
> Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
> sda 4.00 0.00 0.05 0 0
> drbd1 1.00 0.00 0.00 0 0
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.00 0.00 2.00 3.50 0.00 94.50
>
> Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
> sda 3.00 0.00 0.02 0 0
> drbd1 4.00 0.00 0.02 0 0
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.00 0.00 2.49 0.75 0.00 96.76
>
> Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
> sda 1.00 0.00 0.00 0 0
> drbd1 1.00 0.00 0.00 0 0
>
> I don't think our mds is too busy ( do correct me if i have a wrong
> comment on our own situation, plz )
>
> Do you have any ideas or comment
>
> Many many thanks
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
More information about the lustre-discuss
mailing list