[lustre-discuss] Bandwidth bottleneck at socket?

Patrick Farrell paf at cray.com
Wed Aug 30 09:43:31 PDT 2017


Brian,


Hm.  At least from what you said, I see no reason to implicate *sockets* rather than *clients*.  (And there are, in general, no socket level issues you should expect.  The bandwidth in and out of a socket generally dwarfs available network bandwidth.  There are occasionally some NUMA issues, but they shouldn't come up with simple i/o like this.)


Best of luck - I bet the OSC tuning will help.


- Patrick

________________________________
From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Brian Andrus <toomuchit at gmail.com>
Sent: Wednesday, August 30, 2017 11:39:48 AM
To: lustre-discuss at lists.lustre.org
Subject: Re: [lustre-discuss] Bandwidth bottleneck at socket?


Patrick,


By socket-level, I am referring to a physical socket. It seems that increasing the number of cores for an mpirun or ior doesn't increase total throughput unless it is adding another physical socket.

I'm pretty sure the network and OSTs can handle the traffic. I have tested the network to 40Gb/s with iperf and the OSTs are all NVMe

I have used 1, 2 and 3 clients by using an mpi-io copy program. It will read from one file on lustre and write it to another, with each worker reading in its portion of the file.


Hmm. I shall try doing multiple copies at the same time to see what happens. That, I hadn't tested.

We are using Lustre 2.10.51-1 under CentOS 7 kernel 3.10.0-514.26.2

Brian

On 8/30/2017 9:32 AM, Patrick Farrell wrote:

Brian,


I'm not sure what you mean by "socket level".


A starter question:
How fast are your OSTs?  Are you sure the limit isn't the OST?  (Easy way to test - Multiple files on that OST from multiple clients, see how that performs)

(lfs setstripe -i [index] to set the OST for a singly striped file)


In general, you can get ~1.3-1.8 GB/s from one process to one file with a recent-ish Xeon, if your OSTs and network can handle it.  There are a number of other factors that can get involved in limiting your bandwidth with multiple threads.


It sounds like you're always (in the numbers you report) using one client at a time.  Is that correct?


I suspect that you're limited in bandwidth to a specific OST, either by the OST or by the client settings.  What's your bandwidth limit from one client to multiple files on the same OST?  Is it that same 1.5 GB/s?


If so (or even if it's close), you may need to increase your clients RPC size (max_pages_per_rpc in /proc/fs/lustre/osc/[OST]/), or max_rpcs_in_flight (same place).  Note if you increase those you need to increase max_dirty_mb (again, same place).  The manual describes the relationship.


Also - What version of Lustre are you running?  Client & server.


- Patrick

________________________________
From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org><mailto:lustre-discuss-bounces at lists.lustre.org> on behalf of Brian Andrus <toomuchit at gmail.com><mailto:toomuchit at gmail.com>
Sent: Wednesday, August 30, 2017 11:16:08 AM
To: lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
Subject: [lustre-discuss] Bandwidth bottleneck at socket?

All,

I've been doing some various performance tests on a small lustre
filesystem and there seems to be a consistent bottleneck of ~700MB/s per
socket involved.

We have 6 servers with 2 Intel E5-2695 chips in each.

3 servers are clients, 1 is MGS and 2 are OSSes with 1 OST each.
Everything is connected with 40Gb Ethernet.

When I write to a single stripe, the best throughput I see is about
1.5GB/s. That doubles if I write to a file that has 2 stripes.

If I do a parallel copy (using mpiio) I can get 1.5GB/s from a single
machine, whether I use 28 cores or 2 cores. If I only use 1, it goes
down to ~700MB/s

Is there a bandwidth bottleneck that can occur at the socket level for a
system? This really seems like it.


Brian Andrus

_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20170830/871803ad/attachment.htm>


More information about the lustre-discuss mailing list