[Lustre-discuss] OSS Node Local Hard Disk Problem

Wojciech Turek wjt27 at cam.ac.uk
Mon Feb 27 11:00:35 PST 2012


It would be also a good idea if you capture your console as well as this
would give us more details of what actually happened to oss4

On 27 February 2012 18:57, Wojciech Turek <wjt27 at cam.ac.uk> wrote:

> Hi Vijesh,
>
> Most likely your oss4 crashed probably with kernel panic due to faulty
> local disk which I guess holds oss4's OS. This caused lack of communication
> between (heartbeat) openais nodes oss3-oss4 and triggered fencing and
> failover.
>
> Best regards,
>
> Wojciech
>
> On 27 February 2012 06:40, VIJESH EK <ekvijesh at gmail.com> wrote:
>
>> *Dear Sir,*
>> *
>> *
>> *We have a HPC setup with four OSS server(OSS1 to OSS4) and two MDS
>> Nodes(MDS1 to MDS2)*
>> *It has been running till yesterday without any problem. Today morning i
>> found that OSS4 is in *
>> *shutdown condition. I have verified the OSS3 logs and found that it has
>> been got to fencing state*
>> *I have again switched on OSS4  now its running*
>> *
>> *
>> *In OSS4 logs i saw some  "unreadable" error as mentioned below*
>> *
>> *
>> *
>> Feb 26 04:24:43 oss4 smartd[9306]: Device: /dev/sda, 2 Currently
>> unreadable (pending) sectors
>> Feb 26 04:54:43 oss4 smartd[9306]: Device: /dev/sda, 2 Currently
>> unreadable (pending) sectors
>> Feb 26 05:24:43 oss4 smartd[9306]: Device: /dev/sda, 2 Currently
>> unreadable (pending) sectors
>> Feb 26 05:54:43 oss4 smartd[9306]: Device: /dev/sda, 2 Currently
>> unreadable (pending) sectors
>> Feb 26 06:24:43 oss4 smartd[9306]: Device: /dev/sda, 2 Currently
>> unreadable (pending) sectors
>> Feb 26 06:54:43 oss4 smartd[9306]: Device: /dev/sda, 2 Currently
>> unreadable (pending) sectors
>> Feb 26 07:24:43 oss4 smartd[9306]: Device: /dev/sda, 2 Currently
>> unreadable (pending) sectors
>>
>> /dev/sda is a local hard disk. Is it possible the Node fencing is due to
>> this error ?
>> While running the e2fsck will resolve this issue ?
>>
>> *
>> *Herewith i have attached the /var/log/messages of OSS3 and OSS4*
>> *can anybody please analyse the log file and kindly assist me what to do
>> ? *
>> *
>> *
>> *
>> *
>> *
>> *
>> *Thanks & Regards
>>
>> *
>> *VIJESH*
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20120227/b55b86ef/attachment.htm>


More information about the lustre-discuss mailing list