[Lustre-discuss] Reply High difference in I/O network traffic in lustre client

Wed Feb 3 02:35:15 PST 2010

Hi Lex,

If you are using Lustre for small files, another blog post of mine would 
be helpful. Have a look at 
http://blogs.sun.com/atulvid/entry/improving_performance_of_small_files
The post includes some tips on improving performance of small files and 
also has recommendations on tools to use for benchmarking.

Cheers,
_Atul

Lex wrote:
>
>
>     did you measure the performance of this system before lustre?
>     specifically
>
>
> Tell me exactly what information useful for you to help me diagnose 
> our problem, plz
>
>     , your symptoms make it look like your disk system
>     can't handle the load.  since you have lots of small activity,
>     the issue wouldn't be bandwidth, but latency.  I've normally only
>     seen this on the MDS, where metadata traffic can generate quite
>     high numbers of transactions, even though the bandwidth is low.
>
>  
>
>     for instance, is the MDS volume a slow-write form of raid like
>     raid5 or raid6?  MDS activity is mainly small, synchronous
>     transactions
>     such as directory updates, which is why MDS should be on raid10.
>
>
> We use raid10 for our MDS and it's operating quite idle. Below is some 
> info about load average and network traffic ( output from w and bmon 
> command ) It isn't too high to make the delay, right ?
>
> /load average: 0.05, 0.10, 0.09
>
>           Name                          RX                         TX
> ────────────────────────────      ┬────────────────────────
> MDS1 (local)                 │      Rate         #   %  │      
> Rate         #   %
>   0   lo                          │       0 B         0         
> │       0 B         0
>   1   eth0                      │      22 B         0         │  
> 344.59KiB      736
>   2   eth1                      │  670.49KiB     1.37K  │  
> 267.29KiB      592
>   3   bond0                    │  670.51KiB     1.38K  │  
> 611.88KiB     1.30K/
>
>
>         are quite a lot small file: a linux soft links )  Files are
>         "striped" over
>
>
>     in a normal filesystem, symlinks are stored in the inode itself,
>     at least for short symlink targets.  I guess that applies to
>     lustre as well - the symlink would be on the MDS.  but there are
>     issues related to the size of the inode on the MDS, since striping
>     information is also stored in EAs
>     which are also hopefully within the file's inode.  when there's
>     too much to
>     fit into an inode, performance suffers, since the same metadata
>     operations
>     now require extra seeks.
>
>
> I will consider this 
>
>
>         each 2 OSTs, some are striped over all our OSTs ( fewer than 2
>         OSTs parallel
>         striping )
>
>
>     whether it makes sense to stripe over all OSTs or not depends on
>     the sizes of your files.  but since you have only gigabit, it's
>     probably not a good idea.  (that is, accessing a striped file
>     won't be any faster, since it'll bottleneck on the client's
>     network port.)
>
>
> could you please tell me in detail the disadvantage of 1 Gig Ethernet 
> in using lustre and what exactly the bottleneck in client's network 
> port is ? ( i tried to install more NIC for client and bonded it 
> together but it didn't help )
>
> I found in some paper ( got it from google ) that if we using bonding 
> devices with 3 x 1 Gig Ethernet, the problem will be significantly 
> improved. But, in our case, i even couldn't reach the limit of 1 Gig !!!
>  
>
>
>         Do you have any idea for my issue ?
>
>
>     I think you need to find out whether the performance problem is merely
>     due to latency (metadata rate) on the MDS.  looking at normal
>     performance
>     metrics on the MDS when under load (/proc/partitions, etc) might
>     be able
>     to show this.  even "vmstat 1" may be informative, to see what
>     sorts of blocks-per-second IO rates you're getting.
>
>
> Here is output of vmstat 1 in 10 seconds
>
> /root at MDS1: ~ # vmstat 1
> procs -----------memory---------- ---swap-- -----io---- --system-- 
> -----cpu------
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us 
> sy id wa st
>  1  0    140 243968 3314424 432776    0    0     1     6    2    1  0  
> 2 97  1  0
>  0  0    140 244092 3314424 432776    0    0     0     4 3037 6938  0  
> 2 97  1  0
>  0  0    140 244092 3314424 432776    0    0     0     4 2980 6759  0  
> 2 98  1  0
>  0  0    140 244216 3314424 432776    0    0     0    16 3574 8966  0  
> 3 94  3  0
>  0  0    140 244092 3314424 432776    0    0     0     4 3511 8639  1  
> 2 97  1  0
>  0  1    140 244092 3314424 432776    0    0     0    36 3549 8871  0  
> 2 97  1  0
>  0  0    140 244092 3314424 432776    0    0     0     4 3085 7304  0  
> 2 97  1  0
>  0  0    140 243968 3314424 432776    0    0     0    20 3199 7566  0  
> 2 97  1  0
>  0  0    140 244092 3314424 432776    0    0     0    16 3294 7950  0  
> 2 95  3  0
>  0  0    140 244092 3314424 432776    0    0     0     4 3336 8301  0  
> 2 97  1  0/
>
> and iostat -m 1 5
>
> Linux 2.6.18-92.1.17.el5_lustre.1.8.0custom (MDS1)      02/02/2010
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.17    0.02    1.53    1.33    0.00   96.96
>
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sda               3.66         0.00         0.02      12304      79721
> drbd1             6.43         0.00         0.02      10709      70302
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.75    0.00    2.24    0.75    0.00   96.26
>
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sda               1.00         0.00         0.00          0          0
> drbd1             1.00         0.00         0.00          0          0
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.00    0.00    1.75    1.00    0.00   97.24
>
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sda               4.00         0.00         0.05          0          0
> drbd1             1.00         0.00         0.00          0          0
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.00    0.00    2.00    3.50    0.00   94.50
>
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sda               3.00         0.00         0.02          0          0
> drbd1             4.00         0.00         0.02          0          0
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0.00    0.00    2.49    0.75    0.00   96.76
>
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sda               1.00         0.00         0.00          0          0
> drbd1             1.00         0.00         0.00          0          0
>
> I don't think our mds is too busy ( do correct me if i have a wrong 
> comment on our own situation, plz )
>
> Do you have any ideas or comment
>
> Many many thanks
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>