[Lustre-discuss] OST crash with group descriptors

Ms. Megan Larko dobsonunit at gmail.com
Thu Mar 12 20:22:42 PDT 2009


Yay!  I believe I can answer this one.

On Thu, Mar 12, 2009 at 9:08 PM, Mag Gam <magawake at gmail.com> wrote:
> This was a very interesting thread to read. I too have been in the
> same situation and it really stunk! I just went ahead and restored the
> filesystem 10T :-(
>
> Seeing Andreas at work is  art :-)

Very true.
>
> I have a question about this:
>
> Would the OP get 5/6 of his DATA or FILES? 5/6 of DATA is useless!
> However, 5/6 of Files is amazing.  I was under the impression the file
> would even be striped across (even if you don't enable striping).

If one uses the lustre default striping of 1, then one may retrieve
5/6 of the files.

In our case, we set-up lustre with its default stripe value of one, so
when the files were written out each file went to one array of disks
seen by the RAID controller (disks were in essentially dumb JBOD
enclosures).  We had two such enclosures fail (Well, one failed and
the second was an "Ooops" thinking it was the failed unit; JBOD
hardware really is not that bad).  The damaged OSTs were de-activated
per Lustre Manual (lctl---get NID and deactivate specific NID).  The
remaining OSTs were mounted and if I remember correctly the array was
mounted on a Lustre client.  The NID de-activation would cause a quick
"EIO"--or such combination of letters--to skip attempting any access
on the de-activated NIDs and continue to operate (be that search or
copy) on the remaining parts of the system.  The value stripe=1 causes
Lustre to put an entire file onto one OST.   I understand that this is
both a little slower and can use up disk space less efficiently than
striping.   As we did not have a good data back-up strategy (we're
improving that now), we felt the striping of one to be our safest
approach to preserve file integrity.

I hope this helps Mag.   Anyone on List, please correct me where I
have made inaccurate statements.
>
> TIA

megan
>
>
>
>
> On Tue, Mar 10, 2009 at 11:57 AM, Ms. Megan Larko <dobsonunit at gmail.com> wrote:
>> Hi T.H.,
>>
>> I do not envy your situation.   I have been in a very similar
>> scenario.   Andreas Dilger gave me some very good information on
>> deactivating the bad OST and then copying the remaining good files.
>> It worked for me.
>>
>> The thread is archived in cyber-space under:
>> http://osdir.com/ml/file-systems.lustre.user/2008-06/msg00249.html
>>
>> Good Luck,
>> megan
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>



More information about the lustre-discuss mailing list