[Lustre-discuss] OSS load in the roof

Brock Palen brockp at umich.edu
Fri Jun 27 09:44:28 PDT 2008


our OSS went crazy today.  It is attached to two OST's.

The load normally around 2-4.  Right now it is 123.

I noticed this to be the cause:

root      6748  0.0  0.0     0    0 ?        D    May27   8:57  
[ll_ost_io_123]

All of them are stuck in un-interruptible sleep.
Has anyone seen this happen before?  Is this caused by a pending disk  
failure?

I ask the disk system failure because I also see this message:

mptscsi: ioc1: attempting task abort! (sc=0000010038904c40)
scsi1 : destination target 0, lun 0
         command = Read (10) 00 75 94 40 00 00 10 00 00
mptscsi: ioc1: task abort: SUCCESS (sc=0000010038904c40)

and:

Lustre: 6698:0:(lustre_fsfilt.h:306:fsfilt_setattr()) nobackup- 
OST0001: slow setattr 100s
Lustre: 6698:0:(watchdog.c:312:lcw_update_time()) Expired watchdog  
for pid 6698 disabled after 103.1261s

Thanks

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985






More information about the lustre-discuss mailing list