[lustre-discuss] Rebooting storage nodes while jobs are running?
Paul Edmon
pedmon at cfa.harvard.edu
Wed Feb 27 07:54:16 PST 2019
From experience rebooting the storage nodes is fine, the processes
accessing them will just hang until restored. I've done this many times
on our cluster with no ill effect.
That said I have not tried it with kernel upgrades or lustre release
changes. That may do something different and unexpected. Some one else
on the list may have insight on these.
-Paul Edmon-
On 2/27/19 10:17 AM, Bernd Melchers wrote:
> Hi all,
> our environment: CentOS-7.6, lustre-2.12.0 at zfs-0.7.12, 2 mds, 7 ods, 180 clients.
>
> Is it possible to reboot the mds and ods server (e.g. for new kernel or
> new lustre releases) without affecting running jobs on the client nodes?
> The reboot can take up to 15 minutes. Did the clients still wait for
> the storage nodes to reappear or will i/o operations get errors?
> Is the behaviour of a client influenced by the timeout parameter ( "lctl get_param timeout")
> or by other parameters?
>
> Mit freundlichen Grüßen
> Bernd Melchers
>
More information about the lustre-discuss
mailing list