[Lustre-discuss] OSS Node Local Hard Disk Problem

Wojciech Turek wjt27 at cam.ac.uk
Mon Feb 27 10:57:22 PST 2012


Hi Vijesh,

Most likely your oss4 crashed probably with kernel panic due to faulty
local disk which I guess holds oss4's OS. This caused lack of communication
between (heartbeat) openais nodes oss3-oss4 and triggered fencing and
failover.

Best regards,

Wojciech

On 27 February 2012 06:40, VIJESH EK <ekvijesh at gmail.com> wrote:

> *Dear Sir,*
> *
> *
> *We have a HPC setup with four OSS server(OSS1 to OSS4) and two MDS
> Nodes(MDS1 to MDS2)*
> *It has been running till yesterday without any problem. Today morning i
> found that OSS4 is in *
> *shutdown condition. I have verified the OSS3 logs and found that it has
> been got to fencing state*
> *I have again switched on OSS4  now its running*
> *
> *
> *In OSS4 logs i saw some  "unreadable" error as mentioned below*
> *
> *
> *
> Feb 26 04:24:43 oss4 smartd[9306]: Device: /dev/sda, 2 Currently
> unreadable (pending) sectors
> Feb 26 04:54:43 oss4 smartd[9306]: Device: /dev/sda, 2 Currently
> unreadable (pending) sectors
> Feb 26 05:24:43 oss4 smartd[9306]: Device: /dev/sda, 2 Currently
> unreadable (pending) sectors
> Feb 26 05:54:43 oss4 smartd[9306]: Device: /dev/sda, 2 Currently
> unreadable (pending) sectors
> Feb 26 06:24:43 oss4 smartd[9306]: Device: /dev/sda, 2 Currently
> unreadable (pending) sectors
> Feb 26 06:54:43 oss4 smartd[9306]: Device: /dev/sda, 2 Currently
> unreadable (pending) sectors
> Feb 26 07:24:43 oss4 smartd[9306]: Device: /dev/sda, 2 Currently
> unreadable (pending) sectors
>
> /dev/sda is a local hard disk. Is it possible the Node fencing is due to
> this error ?
> While running the e2fsck will resolve this issue ?
>
> *
> *Herewith i have attached the /var/log/messages of OSS3 and OSS4*
> *can anybody please analyse the log file and kindly assist me what to do
> ? *
> *
> *
> *
> *
> *
> *
> *Thanks & Regards
>
> *
> *VIJESH*
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20120227/7c175c79/attachment.htm>


More information about the lustre-discuss mailing list