[lustre-discuss] LU-11465 OSS/MDS deadlock in 2.10.5

Marion Hakanson hakansom at ohsu.edu
Thu Oct 18 23:32:10 PDT 2018


This issue is really kicking our behinds:
https://jira.whamcloud.com/browse/LU-11465

While we're waiting for the issue to get some attention from Lustre developers, are there suggestions on how we can recover our cluster from this kind of deadlocked, stuck-threads-on-the-MDS (or OSS) situation?  Rebooting the storage servers does not clear the hang-up, as upon reboot the MDS quickly ends up with the same number of D-state threads (around the same number as we have clients).  It seems to me like there is some state stashed away in the filesystem which restores the deadlock as soon as the MDS comes up.

Thanks and regards,

Marion

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20181019/2c738893/attachment.html>


More information about the lustre-discuss mailing list