[Lustre-discuss] [Fwd: Re: Broken client]
Kevin Van Maren
Kevin.Van.Maren at oracle.com
Fri Nov 19 08:30:54 PST 2010
Not sure. Could be some clients had data in their cache, and others
hit the error when they tried to get it from the OST.
Sorry I misunderstood you -- I thought you had already run fsck on the
OSTs.
Kevin
On Nov 19, 2010, at 9:41 AM, Herbert Fruchtl <herbert.fruchtl at st-andrews.ac.uk
> wrote:
> Thanks guys, Looks like unmounting the "unhealthy" OST filesystem
> and running an
> fsck on it (which found several errors) solved the problem! I still
> don't
> understand why it looked different from different clients...
>
> Cheers,
>
> Herbert
>
> Oleg Drokin wrote:
>> Hello!
>>
>> So are there any other compplaints on the OSS node when you mount
>> that OST?
>> Did you try to run e2fsck on the ost disk itself (while
>> unmounted)? I assume one of the possible problems is just on0disk
>> fs corruptions
>> (and it might show unhealthy due to that right after mount too).
>>
>> Bye,
>> Oleg
>> On Nov 18, 2010, at 1:47 PM, Herbert Fruchtl wrote:
>>
>>> Sorry, I had meant to cc this to the list.
>>>
>>> Herbert
>>>
>>> From: Herbert Fruchtl <herbert.fruchtl at st-andrews.ac.uk>
>>> Date: November 18, 2010 12:56:53 PM EST
>>> To: Kevin Van Maren <Kevin.Van.Maren at oracle.com>
>>> Subject: Re: [Lustre-discuss] Broken client
>>>
>>>
>>> Hi Kevin,
>>>
>>> That didn't change anything. Umounting the of the OSTs hung (yes,
>>> with an LBUG), and I did a hard reboot. It came up again, and the
>>> status is as before: on the MDT server, I can see all files (well,
>>> I assume it's all); on the client in question some files appear
>>> broken. The OST is still "not healthy". I am running another
>>> lfsck, without much hope. Here's the LBUG:
>>>
>>> Nov 18 17:05:16 oss1-fs kernel: LustreError: 8125:0:
>>> (lprocfs_status.c:865:lprocfs_free_client_stats()) LBU
>>>
>>> Herbert
>>>
>>> Kevin Van Maren wrote:
>>>> Reboot the server with the unhealthy OST.
>>>> If you look at the logs, there is likely an LBUG that is causing
>>>> the problems.
>>>> Kevin
>>>> On Nov 18, 2010, at 9:51 AM, Herbert Fruchtl <herbert.fruchtl at st-andrews.ac.uk
>>>> > wrote:
>>>>>> It looks like you may have corruption on the mdt or an ost,
>>>>>> where the
>>>>>> objects on an OST can't be found for the directory entry. Have
>>>>>> you
>>>>>> had a crash recently or run Lustre fsck? You might need to do
>>>>>> fsck and
>>>>>> delete (unlink) the "broken" files.
>>>>>>
>>>>> The files do exist (I can see them on the mdt server) and I
>>>>> don't want to delete
>>>>> them. There was a crash lately, and I have run an lfsck
>>>>> afterwards (repeatedly,
>>>>> actually.
>>>>>
>>>>>> I suppose it's also possible you're seeing fallout from an
>>>>>> earlier LBUG or
>>>>>> something. Try 'cat /proc/fs/lustre/health_check' on all the
>>>>>> servers.
>>>>>>
>>>>> There seems to be a problem:
>>>>> [root at master ~]# cat /proc/fs/lustre/health_check
>>>>> healthy
>>>>> [root at master ~]# ssh oss1 'cat /proc/fs/lustre/health_check'
>>>>> device home-OST0005 reported unhealthy
>>>>> NOT HEALTHY
>>>>> [root at master ~]# ssh oss2 'cat /proc/fs/lustre/health_check'
>>>>> healthy
>>>>> [root at master ~]# ssh oss3 'cat /proc/fs/lustre/health_check'
>>>>> healthy
>>>>>
>>>>> What do I do about the unhealthy OST?
>>>>>
>>>>> Herbert
>>>>> --
>>>>> Herbert Fruchtl
>>>>> Senior Scientific Computing Officer
>>>>> School of Chemistry, School of Mathematics and Statistics
>>>>> University of St Andrews
>>>>> --
>>>>> The University of St Andrews is a charity registered in Scotland:
>>>>> No SC013532
>>> --
>>> Herbert Fruchtl
>>> Senior Scientific Computing Officer
>>> School of Chemistry, School of Mathematics and Statistics
>>> University of St Andrews
>>> --
>>> The University of St Andrews is a charity registered in Scotland:
>>> No SC013532
>>>
>>>
>>>
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>
> --
> Herbert Fruchtl
> Senior Scientific Computing Officer
> School of Chemistry, School of Mathematics and Statistics
> University of St Andrews
> --
> The University of St Andrews is a charity registered in Scotland:
> No SC013532
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
More information about the lustre-discuss
mailing list