[Lustre-discuss] lustre underperforming

Lucius luciusftp at hotmail.com
Mon Jul 11 04:33:33 PDT 2011


Hello everyone!

we chose lustre for our large filestorage system, but its performance is not what we expected.
Our users should be able to download at 1mbit/sec speed, thats their limit, however during downloading, the speed drops, sometimes even slows down to 0, in 1-30seconds it goes back or disconnects.
The different servers attached to the lustre were used separately before as standalone filestorage servers and it performed 4 times better.
At the lustre clients, the load is between 100-300 because the ftp processes are waiting for the data of the oss.The raid arrays in the OSS perform a disk io of 30-40k/s, although if they are not in the lustre oss they perform disk io 100-140k/s.

Our servers:

mgs/mdt:
2 X Intel Xeon E5620 (12M Cache, 2.40 GHz, 5.86 GT/s Intel QPI),
4 X 4GB 1067Mhz Kingston,
the lustre metadata is stored on 4 X 500GB RAID10 (only used for this ), has a 1Gbit connection to a cisco 3650 switch (all clients and oss are connected to this switch)
The oss/ost servers are not equally the same. We have 3x2 different servers, meaning we have two of each storage, thats 6 oss servers alltogether.
We connected them as the description goes 

Server#1: dg0:
2 X Intel Xeon E5405 (12M Cache, 2.00 GHz, 1333 MHz FSB),
4 X 2GB 667Mhz Kingston
2 X 3Ware 9650SE-24M8 raid controller, with 48x1tb disks. Each controller has 3-3 raid5 OST consisting of 8-8 units, so this server has 6x6,3TB OST = 38 TB storage
the server has 2x1Gbit (bond0) ethernet connection to the switch

Server#2 dg1:
exactly as server#1 dg0

Server#3 dg2:
2 X Intel Xeon E5620 (12M Cache, 2.40 GHz, 5.86 GT/s Intel QPI),
3 X 4GB 1067Mhz Kingston 
1 X 3Ware 9650SE-16ML vezérlő, with 16 x 1t disks, 3x5 Raid5 OST, alltogether 22TB storage
the server has 2x1Gbit (bond0) ethernet connection to the switch

Server#4 dg3:
2 X Intel Xeon E5530 (8M Cache, 2.40 GHz, 5.86 GT/s Intel QPI)
8 X 4GB 1067Mhz Kingston
3 X 3Ware 9650SE-24M8 controller, with each 20-20 disks, so thats 60 x500GB disks. Each controller has two raid5 OST arrays with ten disks. Storage is 25TB
the server has 3x1Gbit (bond0) ethernet connection to the switch

Server#5 is like dg2, server#6 is like dg3

Note: server#4 dg3 was part of another storage before, where it was able to operate with 500-800 users, at 2-2.5gbit/sec bandwidth, but it could even operate with 1000 users at a 2.97gbit/s bandwidth.
The documentation says, even 10 000 users could be on the lustre, however, despite the servers being heterogenic, we don't see the reason for the system to be so slow.
The clients are Intel Xeon X3440 at 2.53GHz cpu / 3 x2 gb 1333 mhz kingston with hw xen support.
Each client has 3 virtual machines, so lustre has 6 same clients.We had before 6 different intel xeon clients, and we experienced the same speed problems as described

Does anyone have an idea, what can cause the problem?

Thank you,
Vic
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110711/53736e28/attachment.htm>


More information about the lustre-discuss mailing list