[lustre-discuss] Help troubleshooting a scalability issue

Sun Jan 22 11:02:59 PST 2017

On Fri, Jan 20, 2017 at 4:20 PM, Massimo Sgaravatto <
massimo.sgaravatto at pd.infn.it> wrote:

> Hi
>
> I have a Lustre cluster composed by 1 MDS and 2 OSS servers.
> Clients are both physical machines (~ 25 boxes) and virtual machines
> (instantiated on a OpenStack cluster). These Virtual Machines are
> dynamically created and destroyed as needed (we have a machinery which
> provides such automatic elasticity). They access the Lustre cluster through
> a NAT.
>
Did you check if you are running out of available ports to maintain open
connections etc.? What about the 'switching' capacity of the virtual
switch/router? The throughput on the interface? RAM/CPU usage of the
switch/router?
Not really the Lustre side of things but also things that could be messing
up and can be ruled out fairly easily.
HTH,
Eli

>
> We start having problems when the number of virtual machines reaches a
> certain value (about 130 - 140).
> In such scenario we start seeing problems: we are not able to mount
> anymore Lustre on new clients and the access to the lustre file system is
> very slow.
>
>
> In the OSS and MDS syslogs I see a lot of errors, such as:
>
> Request sent has timed out for slow reply
> bulk GET failed
> Request sent has failed due to network error
> lock blocking callback time out
>
> In:
>
> https://dl.dropboxusercontent.com/u/7639059/LustreLog/lustre-mds.txt
> https://dl.dropboxusercontent.com/u/7639059/LustreLog/lustre-oss-01.txt
> https://dl.dropboxusercontent.com/u/7639059/LustreLog/lustre-oss-03.txt
>
> I saved a copy of these syslogs (just related to Lustre, and just for a
> time slot when the problem happened).
> In this example 10.64.22.248 is a new VM that is not able to mount the
> lustre filesystem.
>
>
> There aren't network saturations when the problem happen and the lustre
> servers don't appear heavily loaded.
>
> I would appreciate any hints that could help in troubleshooting this issue
>
>
> Thanks, Massimo
>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20170122/0583965b/attachment.htm>