[Lustre-discuss] [Fwd: Re: Broken client]

Oleg Drokin oleg.drokin at oracle.com
Thu Nov 18 13:16:59 PST 2010


Hello!

  So are there any other compplaints on the OSS node when you mount that OST?
  Did you try to run e2fsck on the ost disk itself (while unmounted)? I assume one of the possible problems is just on0disk fs corruptions
  (and it might show unhealthy due to that right after mount too).

Bye, 
    Oleg
On Nov 18, 2010, at 1:47 PM, Herbert Fruchtl wrote:

> Sorry, I had meant to cc this to the list.
> 
>  Herbert
> 
> From: Herbert Fruchtl <herbert.fruchtl at st-andrews.ac.uk>
> Date: November 18, 2010 12:56:53 PM EST
> To: Kevin Van Maren <Kevin.Van.Maren at oracle.com>
> Subject: Re: [Lustre-discuss] Broken client
> 
> 
> Hi Kevin,
> 
> That didn't change anything. Umounting the of the OSTs hung (yes, with an LBUG), and I did a hard reboot. It came up again, and the status is as before: on the MDT server, I can see all files (well, I assume it's all); on the client in question some files appear broken. The OST is still "not healthy". I am running another lfsck, without much hope. Here's the LBUG:
> 
> Nov 18 17:05:16 oss1-fs kernel: LustreError: 8125:0:(lprocfs_status.c:865:lprocfs_free_client_stats()) LBU
> 
>  Herbert
> 
> Kevin Van Maren wrote:
>> Reboot the server with the unhealthy OST.
>> If you look at the logs, there is likely an LBUG that is causing the problems.
>> Kevin
>> On Nov 18, 2010, at 9:51 AM, Herbert Fruchtl <herbert.fruchtl at st-andrews.ac.uk> wrote:
>>>> 
>>>> It looks like you may have corruption on the mdt or an ost, where the
>>>> objects on an OST can't be found for the directory entry. Have you
>>>> had a crash recently or run Lustre fsck? You might need to do fsck and
>>>> delete (unlink) the "broken" files.
>>>> 
>>> The files do exist (I can see them on the mdt server) and I don't want to delete
>>> them. There was a crash lately, and I have run an lfsck afterwards (repeatedly,
>>> actually.
>>> 
>>>> I suppose it's also possible you're seeing fallout from an earlier LBUG or
>>>> something. Try 'cat /proc/fs/lustre/health_check' on all the servers.
>>>> 
>>> There seems to be a problem:
>>> [root at master ~]# cat /proc/fs/lustre/health_check
>>> healthy
>>> [root at master ~]# ssh oss1 'cat /proc/fs/lustre/health_check'
>>> device home-OST0005 reported unhealthy
>>> NOT HEALTHY
>>> [root at master ~]# ssh oss2 'cat /proc/fs/lustre/health_check'
>>> healthy
>>> [root at master ~]# ssh oss3 'cat /proc/fs/lustre/health_check'
>>> healthy
>>> 
>>> What do I do about the unhealthy OST?
>>> 
>>> Herbert
>>> -- 
>>> Herbert Fruchtl
>>> Senior Scientific Computing Officer
>>> School of Chemistry, School of Mathematics and Statistics
>>> University of St Andrews
>>> -- 
>>> The University of St Andrews is a charity registered in Scotland:
>>> No SC013532
> 
> -- 
> Herbert Fruchtl
> Senior Scientific Computing Officer
> School of Chemistry, School of Mathematics and Statistics
> University of St Andrews
> --
> The University of St Andrews is a charity registered in Scotland:
> No SC013532
> 
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss




More information about the lustre-discuss mailing list