[lustre-discuss] Robinhood exhausting RPC resources against 2.5.5 lustre file systems
Jessica Otey
jotey at nrao.edu
Fri May 19 07:23:51 PDT 2017
I think that may be a red herring related to rsyslog? When we most
recently rebooted the MDT, this is the log (still on the box, not on the
log server):
May 3 14:24:22 asimov kernel: LNet: HW CPU cores: 12, npartitions: 4
May 3 14:24:30 asimov kernel: LNet: Added LNI 10.7.17.8 at o2ib [8/256/0/180]
And lctl list_nids gives it once:
[root at asimov ~]# lctl list_nids
10.7.17.8 at o2ib
Jessica
On 5/19/17 10:13 AM, Jeff Johnson wrote:
> Jessica,
>
> You are getting a NID registering twice. Doug noticed and pointed it
> out. I'd look to see if that is one machine doing something twice or
> two machines with the same NID.
>
> --Jeff
>
> On Fri, May 19, 2017 at 05:58 Ms. Megan Larko <dobsonunit at gmail.com
> <mailto:dobsonunit at gmail.com>> wrote:
>
> Greetings Jessica,
>
> I'm not sure I am correctly understanding the behavior "robinhood
> activity floods the MDT". The robinhood program as you (and I)
> are using it is consuming the MDT CHANGELOG via a reader_id which
> was assigned when the CHANGELOG was enabled on the MDT. You can
> check the MDS for these readers via "lctl get_param
> mdd.*.changelog_users". Each CHANGELOG reader must either be
> consumed by a process or destroyed otherwise the CHANGELOG will
> grow until it consumes sufficient space to stop the MDT from
> functioning correctly. So robinhood should consume and then clear
> the CHANGELOG via this reader_id. This implementation of
> robinhood is actually a rather light-weight process as far as the
> MDS is concerned. The load issues I encountered were on the
> robinhood server itself which is a separate server from the Lustre
> MGS/MDS server.
>
> Just curious, have you checked for multiple reader_id's on your
> MDS for this Lustre file system?
>
> P.S. My robinhood configuration file is using nb_threads = 8, just
> for a data point.
>
> Cheers,
> megan
>
> On Thu, May 18, 2017 at 2:36 PM, Jessica Otey <jotey at nrao.edu
> <mailto:jotey at nrao.edu>> wrote:
>
> Hi Megan,
>
> Thanks for your input. We use percona, a drop-in replacement
> for mysql... The robinhood activity floods the MDT, but it
> does not seem to produce any excessive load on the robinhood
> box...
>
> Anyway, FWIW...
>
> ~]# mysql --version
> mysql Ver 14.14 Distrib 5.5.54-38.6, for Linux (x86_64) using
> readline 5.1
>
> Product: robinhood
> Version: 3.0-1
> Build: 2017-03-13 10:29:26
>
> Compilation switches:
> Lustre filesystems
> Lustre Version: 2.5
> Address entries by FID
> MDT Changelogs supported
>
> Database binding: MySQL
>
> RPM: robinhood-lustre-3.0-1.lustre2.5.el6.x86_64
>
> Lustre rpms:
>
> lustre-client-2.5.5-2.6.32_642.15.1.el6.x86_64_g22a210f.x86_64
> lustre-client-modules-2.5.5-2.6.32_642.15.1.el6.x86_64_g22a210f.x86_64
>
>
> On 5/18/17 11:55 AM, Ms. Megan Larko wrote:
>> With regards to (WRT) Subject "Robinhood exhausting RPC
>> resources against 2.5.5 lustre file systems", what version
>> of robinhood and what version of MySQL database? I mention
>> this because I have been working with robinhood-3.0-0.rc1 and
>> initially MySQL-5.5.32 and Lustre 2.5.42.1 on
>> kernel-2.6.32-573 and had issues in which the robinhood
>> server consumed more than the total amount of 32 CPU cores on
>> the robinhood server (with 128 G RAM) and would functionally
>> hang the robinhood server. The issue was solved for me by
>> changing to MySQL-5.6.35. It was the "sort" command in
>> robinhood that was not working well with the MySQL-5.5.32.
>>
>> Cheers,
>> megan
>>
>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> <mailto:lustre-discuss at lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
> --
> ------------------------------
> Jeff Johnson
> Co-Founder
> Aeon Computing
>
> jeff.johnson at aeoncomputing.com <mailto:jeff.johnson at aeoncomputing.com>
> www.aeoncomputing.com <http://www.aeoncomputing.com>
> t: 858-412-3810 x1001 f: 858-412-3845
> m: 619-204-9061
>
> 4170 Morena Boulevard, Suite D - San Diego, CA 92117
>
> High-Performance Computing / Lustre Filesystems / Scale-out Storage
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20170519/ecdb7060/attachment.htm>
More information about the lustre-discuss
mailing list