[Lustre-discuss] lost data after MDS failover
Gregory Matthews
greg.matthews at diamond.ac.uk
Wed Feb 17 04:15:40 PST 2010
We had an LBUG on our MDS (on 15th Feb) and so attempted a failover to
the 2nd MGS/MDS server. This mounted the MGT fine but hung while
mounting the MDT (longer than 5 minutes).
To resolve the problem I unmounted the MGT and the MDT on a freshly
booted MDS/MGS and mounted the MDT as ldiskfs. Then moved aside the
CATALOGS, OBJECTS and last_rcvd files/dirs, unmounted and restarted
lustre (mount -t lustre ....)
This brought the file system back ok but one of our scientists appears
to have lost an entire directory of data from the time the file system
was taken down. The MDS was initally taken out at 1400 (16 Feb) and the
file system was fully back around 1500. The scientist has files in the
directory from 1400 onwards.
Approximately 4000 small files dating from the start of January are
missing. We are running 1.6.6 with a patched kernel 2.6.18-92.1.10.el5
on the servers, the client is running an unreleased patchless RH kernel
2.6.18-171.el5 and 1.6.7.2 lustre modules.
We should have good backups of our metadata and we also have access to
the removed ldiskfs files which were simply renamed. The missing files
have fairly predictable names which might help tracking down the content?
Is there any hope of recovering the missing files/directory?
GREG
--
Greg Matthews 01235 778658
Senior Computer Systems Administrator
Diamond Light Source, Oxfordshire, UK
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: messages.tmp
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100217/098aa30d/attachment.txt>
More information about the lustre-discuss
mailing list