[lustre-discuss] Rebooting storage nodes while jobs are running?

Wed Feb 27 07:17:55 PST 2019

Hi all,
our environment: CentOS-7.6, lustre-2.12.0 at zfs-0.7.12, 2 mds, 7 ods, 180 clients.

Is it possible to reboot the mds and ods server (e.g. for new kernel or
new lustre releases) without affecting running jobs on the client nodes?
The reboot can take up to 15 minutes. Did the clients still wait for
the storage nodes to reappear or will i/o operations get errors?
Is the behaviour of a client influenced by the timeout parameter ( "lctl get_param timeout")
or by other parameters?

Mit freundlichen Grüßen
Bernd Melchers

-- 
Archiv- und Backup-Service | fab-service at zedat.fu-berlin.de
Freie Universität Berlin