[Lustre-discuss] Reply High difference in I/O network traffic in lustre client

Lex lexluthor87 at gmail.com
Mon Feb 1 07:29:18 PST 2010


From: Lex <lexluthor87 at gmail.com>
Date: Mon, Feb 1, 2010 at 10:28 PM
Subject: Re: [Lustre-discuss] High difference in I/O network traffic in
lustre client
To: Mag Gam <magawake at gmail.com>


I have 8 OSSs and 8 OSTs. Hadware info:

CPU Intel(R) xeon E5420 2.5 Ghz Chipset intel 5000P
8GB RAM
8 x 1.5TB hard disks, divided into 2 arrays with raid controller adaptec
5805

We using 2 x 1Gigabit Ethernet card with linux bonding ( OS is centos 5.3 ).
Our lustre client work as web server for downloading file, so there are many
files has been read by web client, i can't provide you an exact number. ( we
have about millions file in our lustre storage system, unfortunately, there
are quite a lot small file: a linux soft links )  Files are "striped" over
each 2 OSTs, some are striped over all our OSTs ( fewer than 2 OSTs parallel
striping )

Do you have any idea for my issue ?

Many thanks


On Mon, Feb 1, 2010 at 8:05 PM, Mag Gam <magawake at gmail.com> wrote:

> How many OSS and OSTs do you have ? What type of hardware are they
> running on? What type of network connection? The file you are trying
> to access what OSS is it on? Are the files striped?
>
>
>
> What
>
> On Mon, Feb 1, 2010 at 4:44 AM, Lex <lexluthor87 at gmail.com> wrote:
> > Hi guys
> >
> > In effort to improve our storage system performance, i found some strange
> > signs but unfortunately, couldn't explain it by myself. So i post here
> for
> > all you guys can't help me to clarify it
> >
> > I'm using lustre client as web server for downloading file. When our
> system
> > in a heavy load ( about 12.000 concurrent connection for 8 web server -
> > lustre client ), %iowait has been pushed to about 98%, load average was
> > about 1-2000 !!!! ( just because of %iowait, i still could manipulate
> > normally almost every command over ssh ) i think it's a terrible number
> in
> > describing load average ! But, at that case, the in and out network
> traffic
> > are almost the same ( although just about few MB/s :( )
> >
> > The odd thing is, right now, when we only have about 3.500 concurrent
> > connection, load average is about 50 ( still too big, right ? ), iowait
> is
> > about 70%, the difference between receive and transmit network is too
> hight,
> > about 10-20MB ( see attached file, please )
> >
> > We just have about 20 connection for our local lustre storage system:
> >
> > netstat -nat | grep 192.168.1.75
> > tcp        0    560 192.168.1.75:1023           192.168.1.85:988
> > ESTABLISHED
> > tcp        0      0 192.168.1.75:1023           192.168.1.81:988
> > ESTABLISHED
> > tcp        0      0 192.168.1.75:988            192.168.1.85:1023
> > ESTABLISHED
> > tcp        0      0 192.168.1.75:988            192.168.1.85:1022
> > ESTABLISHED
> > tcp        0      0 192.168.1.75:988            192.168.1.81:1023
> > ESTABLISHED
> > tcp        0      0 192.168.1.75:988            192.168.1.81:1022
> > ESTABLISHED
> > tcp        0      0 192.168.1.75:988            192.168.1.100:1023
> > ESTABLISHED
> > tcp        0      0 192.168.1.75:1021           192.168.1.78:988
> > ESTABLISHED
> > tcp        0      0 192.168.1.75:1023           192.168.1.78:988
> > ESTABLISHED
> > tcp        0      0 192.168.1.75:1022           192.168.1.78:988
> > ESTABLISHED
> > tcp        0    560 192.168.1.75:1023           192.168.1.100:988
> > ESTABLISHED
> >
> > and about 400 connection with client from internet :
> >
> > netstat -nat | grep out_wan_ip | grep EST | wc -l
> > 407
> >
> > We're currently using 2 Gigabit Ethernet card, one for 192.168.1.0/24
> > network for lnet and the other as wan ip for delivering file out to
> internet
> > and about 15MB/s thoughput was "lost" somehow !!!!
> >
> > So, my question is:
> >
> > - Is there anyone have idea or hint about high load situation with our
> > lustre client - web server like i described above ?  I followed this link
> > and found out  kjournald process is the main main "culprit" ( with our
> ost,
> > it was "ll" process )
> > - What makes the too high difference between receive and transit
> direction
> > in our lustre client - web server ?
> >
> >
> > i'm really stressed with poor performance in our storage system and hope
> > anyone here can help me point out some thing
> >
> > Any help would be highly appreciated
> >
> > Best regards
> >
> >
> >
> >
> >
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100201/1aa16a32/attachment.htm>


More information about the lustre-discuss mailing list