[Lustre-discuss] Broken client

Wang Yibin wang.yibin at oracle.com
Thu Nov 18 06:46:36 PST 2010


Hello,

在 2010-11-18,下午10:03, Herbert Fruchtl 写道:

> I was wrong about only one client having problems. It seems to
> be all of them, except the mds server (see below), so it is a
> problem of the filesystem (not the client) after all.
> 
>> Could you elaborate about how "broken" the files are?
> 
> When I do an 'ls', the filenames are flashing in red (this is
> for example the case for broken symbolic links). Permissions, date
> and owner are missing, like in the middle of the next three
> lines:
> -rw-------   1 root         root    18308319 Jul 16  2009 stat_1247756353.gz
> ?---------   ? ?            ?              ?            ? stat_1248125742.gz
> drwxr-xr-x   2 stephane     ukmhd       4096 Jul  8  2009 stephane
> 
> Attempting to access the file more closely results in an I/O error:
> [root at mhdc ~]# ls -l /workspace/ls-lR_2009-01-20
> ls: /workspace/ls-lR_2009-01-20: Input/output error
> [root at mhdc ~]# cp /workspace/ls-lR_2009-01-20 /tmp
> cp: cannot stat `/workspace/ls-lR_2009-01-20': Input/output error

This looks very much like some OSTs are failing.


> 
>> 
>> From your description and the error message you provide, I suspect that one(or some) of the OSTs went down. What does `lctl dl` show?
>> 
> The files are accessible from the mds server, and the OSTs seem
> visible from the "broken" clients:
> [root at mhdc ~]# lctl dl
>  0 UP mgc MGC192.168.101.214 at tcp 63568484-f714-da05-c5c2-b96db1b22962 5
>  1 UP lov home-clilov-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 4
>  2 UP mdc home-MDT0000-mdc-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 5
>  3 UP osc home-OST0001-osc-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 5
>  4 UP osc home-OST0003-osc-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 5
>  5 UP osc home-OST0002-osc-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 5
>  6 UP osc home-OST0005-osc-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 5
>  7 UP osc home-OST0004-osc-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 5
>  8 UP osc home-OST0000-osc-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 5
> 
> Does this help?

I mean 'lctl dl' output on the OSS servers. Make sure that your OSTs are all mounted and running well.

> 
>  Herbert
> 
>> 在 2010-11-18,下午8:18, Herbert Fruchtl 写道:
>> 
>>> I have a Lustre (1.6.7) system that looks OKish (as far as I can see) from the 
>>> mds and most of the clients. From one client however (the users' login machine) 
>>> it looks broken. Some files are missing, some seem broken, and the df command 
>>> hangs.
>>> 
>>> Rebooting the client doesn't change anything. Is it broken, or is there some 
>>> persistent information that I need to flush? When I do an ls on a partially 
>>> broken directory, I get the following two lines in /var/log/messages:
>>> 
>>> Nov 18 12:13:53 mhdc kernel: [ 7093.751196] LustreError: 
>>> 10919:0:(file.c:999:ll_glimpse_size()) obd_enqueue returned rc -5, returning -EIO
>>> Nov 18 12:13:53 mhdc kernel: [ 7093.761098] LustreError: 
>>> 10919:0:(file.c:999:ll_glimpse_size()) Skipped 9 previous similar messages
>>> 
>>> Any ideas how to proceed with the least disruption?
>>> 
>>> Thanks in advance,
>>> 
>>>  Herbert
>>> -- 
>>> Herbert Fruchtl
>>> Senior Scientific Computing Officer
>>> School of Chemistry, School of Mathematics and Statistics
>>> University of St Andrews
>>> --
>>> The University of St Andrews is a charity registered in Scotland:
>>> No SC013532
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> 
> 
> -- 
> Herbert Fruchtl
> Senior Scientific Computing Officer
> School of Chemistry, School of Mathematics and Statistics
> University of St Andrews
> --
> The University of St Andrews is a charity registered in Scotland:
> No SC013532
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20101118/99f2502a/attachment.htm>


More information about the lustre-discuss mailing list