[lustre-discuss] RobinHood fail on Lustre IEEL v2.5

Colin Faber cfaber at gmail.com
Thu Jun 8 08:59:08 PDT 2017


Sounds like an issue with the client. Have you captured console output to
determine what's failing?

On Jun 8, 2017 8:49 AM, "Langton" <langtonn at eclipseholdings.co.za> wrote:

> I am trying to install Robin Hood to manage a 4PB lustre filesystem.
> The environment is as follows
> IEEL Lustre 2.5
> Robinhood 2.5.5-2
> CentOS release 6.7
> kernel - 2.6.32-573.8.1.el6.x86_64
> 4PB Lustre Filesystem
> Robin Hood Host has 2TB RAM and 390GB Disk capacity
> FDR Infiniband fabric network
> A Failover setup on all lustre servers
>
>  After installing robinhood , I have faced a challenge when I kickstart a
> scan. For some reason the RBH host reboots just a few seconds after issuing
> the scan command. I have traced the robinhood logs but they give the
> following:
>
> 2017/06/08 16:19:38 [15616/21] FS_Scan | openat failed on
> 23/pci-0000:00:1a.0-usb-0:1.6.1:1.2-event-mouse: Too many levels of
> symbolic links
> 2017/06/08 16:19:38 [15616/21] FS_Scan | openat failed on
> 23/platform-pcspkr-event-spkr: Too many levels of symbolic links
> 2017/06/08 16:19:38 [15616/21] FS_Scan | openat failed on
> 23/pci-0000:00:1a.0-usb-0:1.6.1:1.2-mouse: Too many levels of symbolic
> links
> 2017/06/08 16:19:38 [15616/21] FS_Scan | openat failed on
> 23/pci-0000:00:1a.0-usb-0:1.2:1.0-event-mouse: Too many levels of
> symbolic links
> 2017/06/08 16:19:38 [15616/21] FS_Scan | openat failed on
> 23/pci-0000:00:1a.0-usb-0:1.6.1:1.1-mouse: Too many levels of symbolic
> links
> 2017/06/08 16:19:38 [15616/21] FS_Scan | openat failed on
> 23/pci-0000:00:1a.0-usb-0:1.5.1:1.1-event: Too many levels of symbolic
> links
>
> As a test i started the robinhood-lhsm service and it started fine without
> the initial scan.
> The command - rbh-lhsm-report --fs-info gives you some info but not much
> detailed.
> The command - rbh-lhsm-report -a says file storage has never been checked
> which means a scan is needed.
> Currently the filesystem is in production. Can this the main reason why it
> crashes.
> The filesystem is sitting at 2.6PB of used capacity.
>
> Regards
>
> Langton
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20170608/a1c0c9d7/attachment.htm>


More information about the lustre-discuss mailing list