[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

Fri Aug 13 14:54:34 PDT 2010

On 2010-08-13, at 12:29, Adrian Ulrich wrote:
> Pid: 11833, comm: ll_ost_creat_00 Tainted: G      2.6.18-128.7.1.el5_lustre.1.8.1.1 #1
> :ldiskfs:ldiskfs_find_entry+0x1d4/0x5c0
> [<ffffffff88b4bf63>] :ldiskfs:ldiskfs_lookup+0x53/0x290
> [<ffffffff800366e8>] __lookup_hash+0x10b/0x130
> [<ffffffff800e2c9b>] lookup_one_len+0x53/0x61
> [<ffffffff88bd71ed>] :obdfilter:filter_fid2dentry+0x42d/0x730
> [<ffffffff88bd3383>] :obdfilter:filter_statfs+0x273/0x350
> [<ffffffff88bd2f00>] :obdfilter:filter_parent_lock+0x20/0x220
> [<ffffffff88bd7d43>] :obdfilter:filter_precreate+0x843/0x19e0
> [<ffffffff88be1e19>] :obdfilter:filter_create+0x10b9/0x15e0
> [<ffffffff88ba161d>] :ost:ost_handle+0x131d/0x5a70

Two possibilities I can see:
- MDS sent very large create request.  Compare the values from:
  mds> lctl get_param osc.*.prealloc_*
  oss> lctl get_param obdfilter.*.last_id

  and see if they match.  If last_id is growing quickly the thread is busy
  precreating many objects for some reason.  If this OST has a much higher
  prealloc_last_id on the MDS, something is bad in the MDS lov_objids file.

- the on-disk structure of the object directory for this OST is corrupted.
  Run "e2fsck -fp /dev/{ostdev}" on the unmounted OST filesystem.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.