[lustre-discuss] RobinHood fail on Lustre IEEL v2.5

Langton langtonn at eclipseholdings.co.za
Thu Jun 8 07:49:45 PDT 2017


I am trying to install Robin Hood to manage a 4PB lustre filesystem.
The environment is as follows
IEEL Lustre 2.5
Robinhood 2.5.5-2
CentOS release 6.7
kernel - 2.6.32-573.8.1.el6.x86_64
4PB Lustre Filesystem
Robin Hood Host has 2TB RAM and 390GB Disk capacity
FDR Infiniband fabric network
A Failover setup on all lustre servers

  After installing robinhood , I have faced a challenge when I kickstart 
a scan. For some reason the RBH host reboots just a few seconds after 
issuing the scan command. I have traced the robinhood logs but they give 
the following:

2017/06/08 16:19:38 [15616/21] FS_Scan | openat failed on 
23/pci-0000:00:1a.0-usb-0:1.6.1:1.2-event-mouse: Too many levels of 
symbolic links
2017/06/08 16:19:38 [15616/21] FS_Scan | openat failed on 
23/platform-pcspkr-event-spkr: Too many levels of symbolic links
2017/06/08 16:19:38 [15616/21] FS_Scan | openat failed on 
23/pci-0000:00:1a.0-usb-0:1.6.1:1.2-mouse: Too many levels of symbolic links
2017/06/08 16:19:38 [15616/21] FS_Scan | openat failed on 
23/pci-0000:00:1a.0-usb-0:1.2:1.0-event-mouse: Too many levels of 
symbolic links
2017/06/08 16:19:38 [15616/21] FS_Scan | openat failed on 
23/pci-0000:00:1a.0-usb-0:1.6.1:1.1-mouse: Too many levels of symbolic links
2017/06/08 16:19:38 [15616/21] FS_Scan | openat failed on 
23/pci-0000:00:1a.0-usb-0:1.5.1:1.1-event: Too many levels of symbolic links

As a test i started the robinhood-lhsm service and it started fine 
without the initial scan.
The command - rbh-lhsm-report --fs-info gives you some info but not much 
detailed.
The command - rbh-lhsm-report -a says file storage has never been 
checked which means a scan is needed.
Currently the filesystem is in production. Can this the main reason why 
it crashes.
The filesystem is sitting at 2.6PB of used capacity.

Regards

Langton


More information about the lustre-discuss mailing list