[lustre-discuss] RobinHood fail on Lustre IEEL v2.5
Langton
langtonn at eclipseholdings.co.za
Thu Jun 8 07:49:45 PDT 2017
I am trying to install Robin Hood to manage a 4PB lustre filesystem.
The environment is as follows
IEEL Lustre 2.5
Robinhood 2.5.5-2
CentOS release 6.7
kernel - 2.6.32-573.8.1.el6.x86_64
4PB Lustre Filesystem
Robin Hood Host has 2TB RAM and 390GB Disk capacity
FDR Infiniband fabric network
A Failover setup on all lustre servers
After installing robinhood , I have faced a challenge when I kickstart
a scan. For some reason the RBH host reboots just a few seconds after
issuing the scan command. I have traced the robinhood logs but they give
the following:
2017/06/08 16:19:38 [15616/21] FS_Scan | openat failed on
23/pci-0000:00:1a.0-usb-0:1.6.1:1.2-event-mouse: Too many levels of
symbolic links
2017/06/08 16:19:38 [15616/21] FS_Scan | openat failed on
23/platform-pcspkr-event-spkr: Too many levels of symbolic links
2017/06/08 16:19:38 [15616/21] FS_Scan | openat failed on
23/pci-0000:00:1a.0-usb-0:1.6.1:1.2-mouse: Too many levels of symbolic links
2017/06/08 16:19:38 [15616/21] FS_Scan | openat failed on
23/pci-0000:00:1a.0-usb-0:1.2:1.0-event-mouse: Too many levels of
symbolic links
2017/06/08 16:19:38 [15616/21] FS_Scan | openat failed on
23/pci-0000:00:1a.0-usb-0:1.6.1:1.1-mouse: Too many levels of symbolic links
2017/06/08 16:19:38 [15616/21] FS_Scan | openat failed on
23/pci-0000:00:1a.0-usb-0:1.5.1:1.1-event: Too many levels of symbolic links
As a test i started the robinhood-lhsm service and it started fine
without the initial scan.
The command - rbh-lhsm-report --fs-info gives you some info but not much
detailed.
The command - rbh-lhsm-report -a says file storage has never been
checked which means a scan is needed.
Currently the filesystem is in production. Can this the main reason why
it crashes.
The filesystem is sitting at 2.6PB of used capacity.
Regards
Langton
More information about the lustre-discuss
mailing list