[Lustre-discuss] Broken client

Herbert Fruchtl herbert.fruchtl at st-andrews.ac.uk
Thu Nov 18 06:03:41 PST 2010


I was wrong about only one client having problems. It seems to
be all of them, except the mds server (see below), so it is a
problem of the filesystem (not the client) after all.

> Could you elaborate about how "broken" the files are?

When I do an 'ls', the filenames are flashing in red (this is
for example the case for broken symbolic links). Permissions, date
and owner are missing, like in the middle of the next three
lines:
-rw-------   1 root         root    18308319 Jul 16  2009 stat_1247756353.gz
?---------   ? ?            ?              ?            ? stat_1248125742.gz
drwxr-xr-x   2 stephane     ukmhd       4096 Jul  8  2009 stephane

Attempting to access the file more closely results in an I/O error:
[root at mhdc ~]# ls -l /workspace/ls-lR_2009-01-20
ls: /workspace/ls-lR_2009-01-20: Input/output error
[root at mhdc ~]# cp /workspace/ls-lR_2009-01-20 /tmp
cp: cannot stat `/workspace/ls-lR_2009-01-20': Input/output error

> 
> From your description and the error message you provide, I suspect that one(or some) of the OSTs went down. What does `lctl dl` show?
> 
The files are accessible from the mds server, and the OSTs seem
visible from the "broken" clients:
[root at mhdc ~]# lctl dl
  0 UP mgc MGC192.168.101.214 at tcp 63568484-f714-da05-c5c2-b96db1b22962 5
  1 UP lov home-clilov-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 4
  2 UP mdc home-MDT0000-mdc-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 5
  3 UP osc home-OST0001-osc-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 5
  4 UP osc home-OST0003-osc-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 5
  5 UP osc home-OST0002-osc-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 5
  6 UP osc home-OST0005-osc-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 5
  7 UP osc home-OST0004-osc-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 5
  8 UP osc home-OST0000-osc-ffff8100d7ecf000 651d7044-988f-f324-6896-3e09edf8a90b 5

Does this help?

  Herbert

> 在 2010-11-18,下午8:18, Herbert Fruchtl 写道:
> 
>> I have a Lustre (1.6.7) system that looks OKish (as far as I can see) from the 
>> mds and most of the clients. From one client however (the users' login machine) 
>> it looks broken. Some files are missing, some seem broken, and the df command 
>> hangs.
>>
>> Rebooting the client doesn't change anything. Is it broken, or is there some 
>> persistent information that I need to flush? When I do an ls on a partially 
>> broken directory, I get the following two lines in /var/log/messages:
>>
>> Nov 18 12:13:53 mhdc kernel: [ 7093.751196] LustreError: 
>> 10919:0:(file.c:999:ll_glimpse_size()) obd_enqueue returned rc -5, returning -EIO
>> Nov 18 12:13:53 mhdc kernel: [ 7093.761098] LustreError: 
>> 10919:0:(file.c:999:ll_glimpse_size()) Skipped 9 previous similar messages
>>
>> Any ideas how to proceed with the least disruption?
>>
>> Thanks in advance,
>>
>>   Herbert
>> -- 
>> Herbert Fruchtl
>> Senior Scientific Computing Officer
>> School of Chemistry, School of Mathematics and Statistics
>> University of St Andrews
>> --
>> The University of St Andrews is a charity registered in Scotland:
>> No SC013532
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 

-- 
Herbert Fruchtl
Senior Scientific Computing Officer
School of Chemistry, School of Mathematics and Statistics
University of St Andrews
--
The University of St Andrews is a charity registered in Scotland:
No SC013532



More information about the lustre-discuss mailing list