[Lustre-discuss] ll_ost thread soft lockup

Robin Humble robin.humble+lustre at anu.edu.au
Mon Mar 19 07:27:59 PDT 2012

On Mon, Mar 19, 2012 at 07:28:22AM -0600, Kevin Van Maren wrote:
>You are running 1.8.5, which does not have the fix for the known MD raid5/6 rebuild corruption bug.  That fix was released in the Oracle Lustre 1.8.7 kernel patches.  Unless you already applied that patch, you might want to run a check of your raid arrays and consider an upgrade (at least patch your kernel with that fix).
>md-avoid-corrupted-ldiskfs-after-rebuild.patch in the 2.6-rhel5.series (note that this bug is NOT specific to rhel5).  This fix does NOT appear to have been picked up by whamcloud.

as you say, the md rebuild bug is in all kernels < 2.6.32

the Whamcloud fix is LU-824 which landed in git a tad after 1.8.7-wc1.

I also asked RedHat nicely, and they added the same patch to RHEL5.8
kernels, which IMHO is the correct place for a fundamental md fix.

so once Lustre supports RHEL5.8 servers, then the patch in Lustre
isn't needed any more.

