[Lustre-discuss] Reply High difference in I/O network traffic in lustre client

Tue Feb 2 01:11:28 PST 2010

did you measure the performance of this system before lustre?
> specifically

Tell me exactly what information useful for you to help me diagnose our
problem, plz

, your symptoms make it look like your disk system
> can't handle the load.  since you have lots of small activity,
> the issue wouldn't be bandwidth, but latency.  I've normally only
> seen this on the MDS, where metadata traffic can generate quite high
> numbers of transactions, even though the bandwidth is low.
>

> for instance, is the MDS volume a slow-write form of raid like raid5 or
> raid6?  MDS activity is mainly small, synchronous transactions
> such as directory updates, which is why MDS should be on raid10.
>

We use raid10 for our MDS and it's operating quite idle. Below is some info
about load average and network traffic ( output from w and bmon command ) It
isn't too high to make the delay, right ?

*load average: 0.05, 0.10, 0.09

          Name                          RX                         TX
────────────────────────────      ┬────────────────────────
MDS1 (local)                 │      Rate         #   %  │      Rate
#   %
  0   lo                          │       0 B         0         │       0
B         0
  1   eth0                      │      22 B         0         │
344.59KiB      736
  2   eth1                      │  670.49KiB     1.37K  │  267.29KiB
592
  3   bond0                    │  670.51KiB     1.38K  │  611.88KiB
1.30K*

>  are quite a lot small file: a linux soft links )  Files are "striped" over
>>
>
> in a normal filesystem, symlinks are stored in the inode itself, at least
> for short symlink targets.  I guess that applies to lustre as well - the
> symlink would be on the MDS.  but there are issues related to the size of
> the inode on the MDS, since striping information is also stored in EAs
> which are also hopefully within the file's inode.  when there's too much to
> fit into an inode, performance suffers, since the same metadata operations
> now require extra seeks.
>

I will consider this

>  each 2 OSTs, some are striped over all our OSTs ( fewer than 2 OSTs
>> parallel
>> striping )
>>
>
> whether it makes sense to stripe over all OSTs or not depends on the sizes
> of your files.  but since you have only gigabit, it's probably not a good
> idea.  (that is, accessing a striped file won't be any faster, since it'll
> bottleneck on the client's network port.)
>

could you please tell me in detail the disadvantage of 1 Gig Ethernet in
using lustre and what exactly the bottleneck in client's network port is ? (
i tried to install more NIC for client and bonded it together but it didn't
help )

I found in some paper ( got it from google ) that if we using bonding
devices with 3 x 1 Gig Ethernet, the problem will be significantly improved.
But, in our case, i even couldn't reach the limit of 1 Gig !!!

>
>  Do you have any idea for my issue ?
>>
>
> I think you need to find out whether the performance problem is merely
> due to latency (metadata rate) on the MDS.  looking at normal performance
> metrics on the MDS when under load (/proc/partitions, etc) might be able
> to show this.  even "vmstat 1" may be informative, to see what sorts of
> blocks-per-second IO rates you're getting.
>
>
Here is output of vmstat 1 in 10 seconds

*root at MDS1: ~ # vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
wa st
 1  0    140 243968 3314424 432776    0    0     1     6    2    1  0  2 97
1  0
 0  0    140 244092 3314424 432776    0    0     0     4 3037 6938  0  2 97
1  0
 0  0    140 244092 3314424 432776    0    0     0     4 2980 6759  0  2 98
1  0
 0  0    140 244216 3314424 432776    0    0     0    16 3574 8966  0  3 94
3  0
 0  0    140 244092 3314424 432776    0    0     0     4 3511 8639  1  2 97
1  0
 0  1    140 244092 3314424 432776    0    0     0    36 3549 8871  0  2 97
1  0
 0  0    140 244092 3314424 432776    0    0     0     4 3085 7304  0  2 97
1  0
 0  0    140 243968 3314424 432776    0    0     0    20 3199 7566  0  2 97
1  0
 0  0    140 244092 3314424 432776    0    0     0    16 3294 7950  0  2 95
3  0
 0  0    140 244092 3314424 432776    0    0     0     4 3336 8301  0  2 97
1  0*

and iostat -m 1 5

Linux 2.6.18-92.1.17.el5_lustre.1.8.0custom (MDS1)      02/02/2010

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.17    0.02    1.53    1.33    0.00   96.96

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sda               3.66         0.00         0.02      12304      79721
drbd1             6.43         0.00         0.02      10709      70302

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.75    0.00    2.24    0.75    0.00   96.26

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sda               1.00         0.00         0.00          0          0
drbd1             1.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    1.75    1.00    0.00   97.24

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sda               4.00         0.00         0.05          0          0
drbd1             1.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    2.00    3.50    0.00   94.50

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sda               3.00         0.00         0.02          0          0
drbd1             4.00         0.00         0.02          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    2.49    0.75    0.00   96.76

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sda               1.00         0.00         0.00          0          0
drbd1             1.00         0.00         0.00          0          0

I don't think our mds is too busy ( do correct me if i have a wrong comment
on our own situation, plz )

Do you have any ideas or comment

Many many thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100202/9accc80b/attachment.htm>