[lustre-discuss] Question regarding user access during recovery and journal replay

Marc O'Brien Marc.OBrien at cruk.cam.ac.uk
Tue Mar 14 05:24:59 PDT 2023


Hi,
When I was first taught some Lustre file system administration, it was stressed that when recovering a Lustre file system and while the journal replay was occurring on each host, there should be no user interaction with the file system. Any recovery was done with cluster access denied to HPC users, or when the cluster was deemed to be quiescent. This seemed to make sense as during journal replay the file system is in R/W state, but the distributed file system may not have reached a stable state. We now have multiple Lustre file systems (2 Ext4 based and 1 ZFS based) and evicting users or finding a quiescent time is problematic (luckily there are maintenance windows for the routine stuff).
I have searched online and have yet to see in print that there should be no user interaction with Lustre during recovery or journal replay (I may have missed it).
So, my question is, is the no cluster user interaction during recovery and journal replay restriction, actually a thing?
Thanks in advance for any enlightenment :)
Marc

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230314/9c53d29d/attachment.htm>


More information about the lustre-discuss mailing list