[lustre-discuss] one ost down

Jongwoo Han jongwoohan at gmail.com
Fri Nov 15 17:50:47 PST 2019


Strange, one drive failure in a raidset is not affected by rebuilding
another raidset. All OSTs have their own raid and they are independent.
And AFAIK, DDN is preconfigured to use RAID-6, so it will endure up to 2
drive or enclosure failure.
Have you tried replacing failed drive? If not, you better do first.
If it happens again after rebuild, another thing to try is to shut down the
OSSes by completely powerfing off entire shelves.

Regards,
Jongwoo Han

2019년 11월 15일 (금) 오후 6:01, Einar Næss Jensen <einar.nass.jensen at ntnu.no>님이
작성:

>
> Hello dear lustre community.
>
>
> We have a lustre file system, where one ost is having problems.
>
> The underlying diskarray, an old sfa10k from DDN (without support), have
> one raidset with ca 1300 bad blocks. The bad blocks came about when one
> disk in the raid failed while another drive in other raidset was rebuilding.
>
>
> Now.
>
> The ost is offline, and the file system seems useable for new files, while
> old files on the corresponding ost is generating lots of kernel messages on
> the OSS.
>
> Quotainformation is not available though.
>
>
> Questions:
>
> May I assume that for new files, everything is fine, since they are not
> using the inactive device anyway?
>
> I tried to run e2fschk on the ost unmounted, while jobs were still running
> on the filesystem, and for a few minutes it seemd like this was working, as
> the filesystem seemed to come back complete afterwards. After a few minutes
> the ost failed again, though.
>
>
> Any pointers on how to rebuild/fix the ost and get it back is very much
> appreciated.
>
>
> Also how to regenerate the quotainformation, which is currently
> unavailable would help. With or without the troublesome OST.
>
>
>
>
> Best Regards
>
> Einar Næss Jensen (on flight to Denver)
>
>
>
> --
> Einar Næss Jensen
> NTNU HPC Section
> Norwegian University of Science and Technoloy
> Address: Høgskoleringen 7i
>          N-7491 Trondheim, NORWAY
> tlf:     +47 90990249
> email:   einar.nass.jensen at ntnu.no
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


-- 
Jongwoo Han
+82-505-227-6108
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20191116/3f1d3f0a/attachment.html>


More information about the lustre-discuss mailing list