[Lustre-discuss] Unusable ZFS backup MDT/MGS after change

Jesse Stroik jstroik at ssec.wisc.edu
Wed May 21 08:23:20 PDT 2014

We have been a method to have a backup MDS and MDT using the 
documentation here:


The procedure we follow is commented by Scott and was taken from our own 
internal documentation. Our method is to create a ZFS file system


And then we snapshot (-r), then zfs send (-R)/receive it. To use the 
backup we swap its IP addresses and name with the primary.

If the backup is swapped and used immediately after the snapshot is 
taken, this method works. However, it does not work if you continue to 
use the original server before migrating to the backup -- the equivalent 
of migrating to a not-perfectly-up-to-date snapshot. In that case, you 
can mount and read from the file system but no newly written files make 
it to the OSTs.

The MDT does show files we attempt to create but their attributes are 
all unknown. We are unable to manipulate the files (rm, mv, etc). The 
error returned is "cannot allocate memory" (LU-4524).

We suspected the configuration logs and so we re-ran writeconf and 
remounted. Same behavior.

On our primary MDT right now we have both a working MGS/MDT and also the 
non-working MGS/MDT which we could switch to if testing were requested.

We are running lustre 2.4.0 on the servers and have tested with 2.4.0 
and 2.1.6 clients.

Jesse Stroik

University of Wisconsin

