[lustre-discuss] Changelog users failing to clear records in 2.8, can anyone help?

Gibbins, Faye Faye.Gibbins at cirrus.com
Thu Jun 1 07:09:09 PDT 2017


We have 4 file systems on our lustre cluster. All have changelog users registered for robinhood to use.

We have discovered that a changelog user for one of the file systems is not catching up to its index. Manual runs of Robinhood fail to read any more records even though according to mdd/tools-MDT0000/changelog_users there are record to read!

Over time the change log had filled and the file system had become sluggish. Wiping the robinhood mysql and reinitializing robin hood with a full scan didn't fix the issue and like I said above three other change logs from different file systems (on the same MSG) are ok when used from the same robinhood instance.

What makes me think this is a lustre (and we are using 2.8 on ext4) problem is this (repeated) error we are getting in syslog:

[Wed May 31 14:06:59 2017] Lustre: 46400:0:(llog.c:530:llog_process_thread()) invalid length -420090294 in llog record for index 372672342/61708
[Wed May 31 14:06:59 2017] LustreError: 46400:0:(mdd_device.c:261:llog_changelog_cancel()) tools-MDD0000: cancel idx 645 of catalog 0x7:10 rc=-22

Deregistering the user from the change log and starting with a new one has not changed the behaviour and we still can't use this new user to track changes to the file system.

Can anyone offer any advice on how to resolve this issue in the changelog?
If not can anyone confirm if taking the file system down for a e2fsck/lfsck will fix issues with the changelog? I'd settle for being able to clear the whole log and starting afresh if that's possible?

Faye Gibbins
Snr SysAdmin, Unix Lead Architect
Software Systems and Cloud Services
Cirrus Logic | cirrus.com<http://www.cirrus.com/>  | +44 (0) 131 272 7398

[cid:image002.png at 01D2CF24.9A35B8F0]

This message and any attachments may contain privileged and confidential information that is intended solely for the person(s) to whom it is addressed. If you are not an intended recipient you must not: read; copy; distribute; discuss; take any action in or make any reliance upon the contents of this message; nor open or read any attachment. If you have received this message in error, please notify us as soon as possible on the following telephone number and destroy this message including any attachments. Thank you. Cirrus Logic International (UK) Ltd and Cirrus Logic International Semiconductor Ltd are companies registered in Scotland, with registered numbers SC089839 and SC495735 respectively. Our registered office is at 7B Nightingale Way, Quartermile, Edinburgh, EH3 9EG, UK. Tel: +44 (0)131 272 7000. cirrus.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20170601/2fd50f49/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 18668 bytes
Desc: image001.png
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20170601/2fd50f49/attachment-0001.png>

More information about the lustre-discuss mailing list