[Lustre-discuss] Lost OSTs, remounted, now /proc/fs/lustre/obdfilter/$UUID/ is empty

Erik Froese erik.froese at gmail.com
Fri Aug 13 17:11:26 PDT 2010


Hello,

We had a problem with our disk controller that required a reboot. 2 of
our OSTs remounted and went through the recovery window but clients
hang trying to access them. Also  /proc/fs/lustre/obdfilter/$UUID/ is
empty for that OST UUID.


LDISKFS FS on dm-5, internal journal on dm-5:8
LDISKFS-fs: delayed allocation enabled
LDISKFS-fs: file extents enabled
LDISKFS-fs: mballoc enabled
LDISKFS-fs: mounted filesystem dm-5 with ordered data mode
Lustre: 16377:0:(filter.c:990:filter_init_server_data()) RECOVERY:
service scratch-OST0007, 281 recoverable clients, 0 delayed clients,
last_rcvd 55834575088
Lustre: scratch-OST0007: Now serving scratch-OST0007 on
/dev/mapper/ost_scratch_7 with recovery enabled
Lustre: scratch-OST0007: Will be in recovery for at least 5:00, or
until 281 clients reconnect
Lustre: 6799:0:(ldlm_lib.c:1788:target_queue_last_replay_reply())
scratch-OST0007: 280 recoverable clients remain
Lustre: 6799:0:(ldlm_lib.c:1788:target_queue_last_replay_reply())
Skipped 279 previous similar messages
Lustre: scratch-OST0007.ost: set parameter quota_type=ug
Lustre: 7305:0:(ldlm_lib.c:1788:target_queue_last_replay_reply())
scratch-OST0007: 276 recoverable clients remain
Lustre: 7305:0:(ldlm_lib.c:1788:target_queue_last_replay_reply())
Skipped 3 previous similar messages
Lustre: 7304:0:(ldlm_lib.c:1788:target_queue_last_replay_reply())
scratch-OST0007: 203 recoverable clients remain
Lustre: 7304:0:(ldlm_lib.c:1788:target_queue_last_replay_reply())
Skipped 72 previous similar messages
Lustre: scratch-OST0007: Recovery period over after 0:57, of 281
clients 281 recovered and 0 were evicted.


[root at oss2 ~]# mount | grep lustre
/dev/mapper/ost_scratch_8 on /lustre/scratch/ost_8 type lustre (rw)
/dev/mapper/ost_scratch_9 on /lustre/scratch/ost_9 type lustre (rw)
/dev/mapper/ost_scratch_7 on /lustre/scratch/ost_7 type lustre (rw)

[root at oss2 ~]# ls -l /proc/fs/lustre/obdfilter/scratch-OST0007/
total 0

e2fsck reported an incorrect free inode count and corrected it. It
didn't help the /proc situation.

Any ideas? This is Lustre 1.8.3 on RHEL.

Erik



More information about the lustre-discuss mailing list