[Lustre-discuss] Hung software raid in 2.6.18-92.1.26 + lustre 1.6.7.2

Mon Jun 22 22:23:00 PDT 2009

Hi All,

Was wondering if anyone might be able to shed any light on some more
problems we've been seeing since our 1.6.7.2 upgrade over the
weekend...

We've upgraded all the OSSes and the MDS to the SDLC
2.6.18-92.1.26.el5_lustre.1.6.7.2smp, and now it appears that
something is causing the software raid layer on the OSSes to freeze
completely.

Even:
[root at oss006 md]# dd if=/dev/md2 of=/dev/null bs=1024k count=1

hangs forever.

/dev/md2 is the OST volume (one OST per OSS), but we see the same
effect on /dev/md0 (and presumably /dev/md1).

This of course causes all the lustre io threads to go into D state one
by one and never return:

[root at oss006 ~]# ps -elf | grep D
F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
1 D root       250    27  0  75   0 -     0 get_ac Jun21 ?
00:00:02 [pdflush]
1 D root       251    27  0  70  -5 -     0 get_ac Jun21 ?
00:00:01 [kswapd0]
1 D root      3232     1  0  75   0 -     0 log_wa Jun21 ?
00:00:00 [obd_zombid]
1 D root      3288    27  0  70  -5 -     0 sync_b Jun21 ?
00:00:37 [kjournald]
1 D root      3303     1  0  75   0 -     0 get_ac Jun21 ?
00:00:00 [ldlm_cn_00]
1 D root      3305     1  0  75   0 -     0 -      Jun21 ?
00:00:00 [ldlm_cn_01]
1 D root      3306     1  0  75   0 -     0 -      Jun21 ?
00:00:00 [ldlm_cn_02]
1 D root      3307     1  0  75   0 -     0 get_ac Jun21 ?
00:00:00 [ldlm_cn_03]
1 D root      3308     1  0  75   0 -     0 -      Jun21 ?
00:00:00 [ldlm_cn_04]
1 D root      3309     1  0  75   0 -     0 -      Jun21 ?
00:00:00 [ldlm_cn_05]
1 D root      3310     1  0  75   0 -     0 -      Jun21 ?
00:00:00 [ldlm_cn_06]
....
1 D root      3455     1  0  75   0 -     0 -      Jun21 ?
00:00:00 [ll_evictor]
1 D root      5996     1  0  75   0 -     0 get_ac 12:16 ?
00:00:00 [ldlm_cn_08]
1 D root      5997     1  0  75   0 -     0 -      12:18 ?
00:00:00 [ldlm_cn_09]
1 D root      6020     1  0  75   0 -     0 get_ac 12:41 ?
00:00:00 [ldlm_cn_10]
4 D root      6107     1  0  77   0 - 16819 get_ac 12:53 ?
00:00:00 dd if /dev/md2 of /dev/null bs 4096k count 10 skip 10000
4 D root      6138     1  0  78   0 - 16818 get_ac 12:55 ?
00:00:01 dd if /dev/md0 of /dev/null bs 4096k count 10 skip 10000

If it's relevant - we haven't _yet_ seen this on our newer OSSes,
which are 7+1 RAID5s.  We are only seeing it on the older 5+1s for
now.

Any help would be greatly appreciated!

Thanks again,
Tim