[lustre-discuss] Robinhood exhausting RPC resources against 2.5.5 lustre file systems

Jessica Otey jotey at nrao.edu
Fri May 19 07:23:51 PDT 2017


I think that may be a red herring related to rsyslog?  When we most 
recently rebooted the MDT, this is the log (still on the box, not on the 
log server):

May  3 14:24:22 asimov kernel: LNet: HW CPU cores: 12, npartitions: 4
May  3 14:24:30 asimov kernel: LNet: Added LNI 10.7.17.8 at o2ib [8/256/0/180]

And lctl list_nids gives it once:

[root at asimov ~]# lctl list_nids
10.7.17.8 at o2ib

Jessica

On 5/19/17 10:13 AM, Jeff Johnson wrote:
> Jessica,
>
> You are getting a NID registering twice. Doug noticed and pointed it 
> out. I'd look to see if that is one machine doing something twice or 
> two machines with the same NID.
>
> --Jeff
>
> On Fri, May 19, 2017 at 05:58 Ms. Megan Larko <dobsonunit at gmail.com 
> <mailto:dobsonunit at gmail.com>> wrote:
>
>     Greetings Jessica,
>
>     I'm not sure I am correctly understanding the behavior "robinhood
>     activity floods the MDT".   The robinhood program as you (and I)
>     are using it is consuming the MDT CHANGELOG via a reader_id which
>     was assigned when the CHANGELOG was enabled on the MDT. You can
>     check the MDS for these readers via "lctl get_param
>     mdd.*.changelog_users".  Each CHANGELOG reader must either be
>     consumed by a process or destroyed otherwise the CHANGELOG will
>     grow until it consumes sufficient space to stop the MDT from
>     functioning correctly.  So robinhood should consume and then clear
>     the CHANGELOG via this reader_id.  This implementation of
>     robinhood is actually a rather light-weight process as far as the
>     MDS is concerned.   The load issues I encountered were on the
>     robinhood server itself which is a separate server from the Lustre
>     MGS/MDS server.
>
>     Just curious, have you checked for multiple reader_id's on your
>     MDS for this Lustre file system?
>
>     P.S. My robinhood configuration file is using nb_threads = 8, just
>     for a data point.
>
>     Cheers,
>     megan
>
>     On Thu, May 18, 2017 at 2:36 PM, Jessica Otey <jotey at nrao.edu
>     <mailto:jotey at nrao.edu>> wrote:
>
>         Hi Megan,
>
>         Thanks for your input. We use percona, a drop-in replacement
>         for mysql... The robinhood activity floods the MDT, but it
>         does not seem to produce any excessive load on the robinhood
>         box...
>
>         Anyway, FWIW...
>
>         ~]# mysql --version
>         mysql  Ver 14.14 Distrib 5.5.54-38.6, for Linux (x86_64) using
>         readline 5.1
>
>         Product:         robinhood
>         Version:         3.0-1
>         Build:           2017-03-13 10:29:26
>
>         Compilation switches:
>             Lustre filesystems
>             Lustre Version: 2.5
>             Address entries by FID
>             MDT Changelogs supported
>
>         Database binding: MySQL
>
>         RPM: robinhood-lustre-3.0-1.lustre2.5.el6.x86_64
>
>         Lustre rpms:
>
>         lustre-client-2.5.5-2.6.32_642.15.1.el6.x86_64_g22a210f.x86_64
>         lustre-client-modules-2.5.5-2.6.32_642.15.1.el6.x86_64_g22a210f.x86_64
>
>
>         On 5/18/17 11:55 AM, Ms. Megan Larko wrote:
>>         With regards to (WRT) Subject "Robinhood exhausting RPC
>>         resources against 2.5.5  lustre file systems", what version
>>         of robinhood and what version of MySQL database?   I mention
>>         this because I have been working with robinhood-3.0-0.rc1 and
>>         initially MySQL-5.5.32 and Lustre 2.5.42.1 on
>>         kernel-2.6.32-573 and had issues in which the robinhood
>>         server consumed more than the total amount of 32 CPU cores on
>>         the robinhood server (with 128 G RAM) and would functionally
>>         hang the robinhood server.   The issue was solved for me by
>>         changing to MySQL-5.6.35.   It was the "sort" command in
>>         robinhood that was not working well with the MySQL-5.5.32.
>>
>>         Cheers,
>>         megan
>>
>
>
>     _______________________________________________
>     lustre-discuss mailing list
>     lustre-discuss at lists.lustre.org
>     <mailto:lustre-discuss at lists.lustre.org>
>     http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
> -- 
> ------------------------------
> Jeff Johnson
> Co-Founder
> Aeon Computing
>
> jeff.johnson at aeoncomputing.com <mailto:jeff.johnson at aeoncomputing.com>
> www.aeoncomputing.com <http://www.aeoncomputing.com>
> t: 858-412-3810 x1001   f: 858-412-3845
> m: 619-204-9061
>
> 4170 Morena Boulevard, Suite D - San Diego, CA 92117
>
> High-Performance Computing / Lustre Filesystems / Scale-out Storage

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20170519/ecdb7060/attachment.htm>


More information about the lustre-discuss mailing list