[Lustre-discuss] [Fwd: Re: Broken client]

Herbert Fruchtl herbert.fruchtl at st-andrews.ac.uk
Fri Nov 19 07:41:34 PST 2010


Thanks guys, Looks like unmounting the "unhealthy" OST filesystem and running an 
fsck on it (which found several errors) solved the problem! I still don't 
understand why it looked different from different clients...

Cheers,

   Herbert

Oleg Drokin wrote:
> Hello!
> 
>   So are there any other compplaints on the OSS node when you mount that OST?
>   Did you try to run e2fsck on the ost disk itself (while unmounted)? I assume one of the possible problems is just on0disk fs corruptions
>   (and it might show unhealthy due to that right after mount too).
> 
> Bye, 
>     Oleg
> On Nov 18, 2010, at 1:47 PM, Herbert Fruchtl wrote:
> 
>> Sorry, I had meant to cc this to the list.
>>
>>  Herbert
>>
>> From: Herbert Fruchtl <herbert.fruchtl at st-andrews.ac.uk>
>> Date: November 18, 2010 12:56:53 PM EST
>> To: Kevin Van Maren <Kevin.Van.Maren at oracle.com>
>> Subject: Re: [Lustre-discuss] Broken client
>>
>>
>> Hi Kevin,
>>
>> That didn't change anything. Umounting the of the OSTs hung (yes, with an LBUG), and I did a hard reboot. It came up again, and the status is as before: on the MDT server, I can see all files (well, I assume it's all); on the client in question some files appear broken. The OST is still "not healthy". I am running another lfsck, without much hope. Here's the LBUG:
>>
>> Nov 18 17:05:16 oss1-fs kernel: LustreError: 8125:0:(lprocfs_status.c:865:lprocfs_free_client_stats()) LBU
>>
>>  Herbert
>>
>> Kevin Van Maren wrote:
>>> Reboot the server with the unhealthy OST.
>>> If you look at the logs, there is likely an LBUG that is causing the problems.
>>> Kevin
>>> On Nov 18, 2010, at 9:51 AM, Herbert Fruchtl <herbert.fruchtl at st-andrews.ac.uk> wrote:
>>>>> It looks like you may have corruption on the mdt or an ost, where the
>>>>> objects on an OST can't be found for the directory entry. Have you
>>>>> had a crash recently or run Lustre fsck? You might need to do fsck and
>>>>> delete (unlink) the "broken" files.
>>>>>
>>>> The files do exist (I can see them on the mdt server) and I don't want to delete
>>>> them. There was a crash lately, and I have run an lfsck afterwards (repeatedly,
>>>> actually.
>>>>
>>>>> I suppose it's also possible you're seeing fallout from an earlier LBUG or
>>>>> something. Try 'cat /proc/fs/lustre/health_check' on all the servers.
>>>>>
>>>> There seems to be a problem:
>>>> [root at master ~]# cat /proc/fs/lustre/health_check
>>>> healthy
>>>> [root at master ~]# ssh oss1 'cat /proc/fs/lustre/health_check'
>>>> device home-OST0005 reported unhealthy
>>>> NOT HEALTHY
>>>> [root at master ~]# ssh oss2 'cat /proc/fs/lustre/health_check'
>>>> healthy
>>>> [root at master ~]# ssh oss3 'cat /proc/fs/lustre/health_check'
>>>> healthy
>>>>
>>>> What do I do about the unhealthy OST?
>>>>
>>>> Herbert
>>>> -- 
>>>> Herbert Fruchtl
>>>> Senior Scientific Computing Officer
>>>> School of Chemistry, School of Mathematics and Statistics
>>>> University of St Andrews
>>>> -- 
>>>> The University of St Andrews is a charity registered in Scotland:
>>>> No SC013532
>> -- 
>> Herbert Fruchtl
>> Senior Scientific Computing Officer
>> School of Chemistry, School of Mathematics and Statistics
>> University of St Andrews
>> --
>> The University of St Andrews is a charity registered in Scotland:
>> No SC013532
>>
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 

-- 
Herbert Fruchtl
Senior Scientific Computing Officer
School of Chemistry, School of Mathematics and Statistics
University of St Andrews
--
The University of St Andrews is a charity registered in Scotland:
No SC013532



More information about the lustre-discuss mailing list