[Lustre-discuss] High difference in I/O network traffic in lustre client

Mag Gam magawake at gmail.com
Mon Feb 1 05:05:21 PST 2010


How many OSS and OSTs do you have ? What type of hardware are they
running on? What type of network connection? The file you are trying
to access what OSS is it on? Are the files striped?



What

On Mon, Feb 1, 2010 at 4:44 AM, Lex <lexluthor87 at gmail.com> wrote:
> Hi guys
>
> In effort to improve our storage system performance, i found some strange
> signs but unfortunately, couldn't explain it by myself. So i post here for
> all you guys can't help me to clarify it
>
> I'm using lustre client as web server for downloading file. When our system
> in a heavy load ( about 12.000 concurrent connection for 8 web server -
> lustre client ), %iowait has been pushed to about 98%, load average was
> about 1-2000 !!!! ( just because of %iowait, i still could manipulate
> normally almost every command over ssh ) i think it's a terrible number in
> describing load average ! But, at that case, the in and out network traffic
> are almost the same ( although just about few MB/s :( )
>
> The odd thing is, right now, when we only have about 3.500 concurrent
> connection, load average is about 50 ( still too big, right ? ), iowait is
> about 70%, the difference between receive and transmit network is too hight,
> about 10-20MB ( see attached file, please )
>
> We just have about 20 connection for our local lustre storage system:
>
> netstat -nat | grep 192.168.1.75
> tcp        0    560 192.168.1.75:1023           192.168.1.85:988
> ESTABLISHED
> tcp        0      0 192.168.1.75:1023           192.168.1.81:988
> ESTABLISHED
> tcp        0      0 192.168.1.75:988            192.168.1.85:1023
> ESTABLISHED
> tcp        0      0 192.168.1.75:988            192.168.1.85:1022
> ESTABLISHED
> tcp        0      0 192.168.1.75:988            192.168.1.81:1023
> ESTABLISHED
> tcp        0      0 192.168.1.75:988            192.168.1.81:1022
> ESTABLISHED
> tcp        0      0 192.168.1.75:988            192.168.1.100:1023
> ESTABLISHED
> tcp        0      0 192.168.1.75:1021           192.168.1.78:988
> ESTABLISHED
> tcp        0      0 192.168.1.75:1023           192.168.1.78:988
> ESTABLISHED
> tcp        0      0 192.168.1.75:1022           192.168.1.78:988
> ESTABLISHED
> tcp        0    560 192.168.1.75:1023           192.168.1.100:988
> ESTABLISHED
>
> and about 400 connection with client from internet :
>
> netstat -nat | grep out_wan_ip | grep EST | wc -l
> 407
>
> We're currently using 2 Gigabit Ethernet card, one for 192.168.1.0/24
> network for lnet and the other as wan ip for delivering file out to internet
> and about 15MB/s thoughput was "lost" somehow !!!!
>
> So, my question is:
>
> - Is there anyone have idea or hint about high load situation with our
> lustre client - web server like i described above ?  I followed this link
> and found out  kjournald process is the main main "culprit" ( with our ost,
> it was "ll" process )
> - What makes the too high difference between receive and transit direction
> in our lustre client - web server ?
>
>
> i'm really stressed with poor performance in our storage system and hope
> anyone here can help me point out some thing
>
> Any help would be highly appreciated
>
> Best regards
>
>
>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>



More information about the lustre-discuss mailing list