[Lustre-discuss] "Slow XXXX" Syslog Messages

Charles Taylor taylor at hpc.ufl.edu
Fri Jun 27 03:08:12 PDT 2008


I'm curious about all the "slow xxxx" messages on our OSSs.   It is  
true that the OSSs are getting hammered and the "spindles"  (actually  
RAID 5 LUNs) are saturated (96% to 100% busy in iostat).   However,  
await and svctm numbers are *very* reasonable (in my experience).    
The await times range from 50 to 250 ms and the svctm numbers are  
usually only a few ms.

The cpus are mostly idle or servicing interrupts from the FC HCAs and  
IB HCAs and the MDT threads are mostly sleeping or in disk wait.   The  
answer seems obvious enough but lustre is reporting delays of (that's  
what I assume the "slow whatevers" are about)  on the order of a  
minute which is 2 to 3 orders of magnitude beyond anything I'm seeing  
in I/O stat.

What is Lustre really trying to tell us?    It seems like there may be  
some race condition that is forcing some of these operations to wait  
until one or more timers expire before they can proceed.    Device  
delays alone don't seem to account for it.

Thanks,

Charlie Taylor
UF HPC Center


Jun 24 21:36:01 tebow1 kernel: Lustre: 2003:0:(lustre_fsfilt.h: 
306:fsfilt_setattr()) crn-OST0001: slow setattr 45s
Jun 25 00:20:03 tebow1 kernel: Lustre: 876:0:(lustre_fsfilt.h: 
306:fsfilt_setattr()) crn-OST0008: slow setattr 46s
Jun 25 00:30:00 tebow1 kernel: Lustre: 349:0:(lustre_fsfilt.h: 
306:fsfilt_setattr()) crn-OST000a: slow setattr 57s
Jun 25 02:51:03 tebow1 kernel: Lustre: 31903:0:(filter_io_26.c: 
698:filter_commitrw_write()) crn-OST0005: slow i_mutex 30s
Jun 25 02:51:03 tebow1 kernel: Lustre: 31903:0:(filter_io_26.c: 
711:filter_commitrw_write()) crn-OST0005: slow brw_start 30s
Jun 25 02:51:03 tebow1 kernel: Lustre: 6559:0:(lustre_fsfilt.h: 
296:fsfilt_commit_wait()) crn-OST0005: slow journal start 30s
Jun 25 02:51:03 tebow1 kernel: Lustre: 2691:0:(filter_io_26.c: 
776:filter_commitrw_write()) crn-OST0005: slow commitrw commit 30s
Jun 25 02:51:03 tebow1 kernel: Lustre: 7681:0:(filter_io_26.c: 
763:filter_commitrw_write()) crn-OST0005: slow direct_io 30s
Jun 25 02:52:43 tebow1 kernel: Lustre: 6541:0:(lustre_fsfilt.h: 
205:fsfilt_start_log()) crn-OST0005: slow journal start 81s
Jun 25 02:52:43 tebow1 kernel: Lustre: 32269:0:(lustre_fsfilt.h: 
240:fsfilt_brw_start_log()) crn-OST0005: slow journal start 68s
Jun 25 02:52:43 tebow1 kernel: Lustre: 32269:0:(filter_io_26.c: 
711:filter_commitrw_write()) crn-OST0005: slow brw_start 68s
Jun 25 02:52:43 tebow1 kernel: Lustre: 6619:0:(lustre_fsfilt.h: 
240:fsfilt_brw_start_log()) crn-OST0005: slow journal start 68s
Jun 25 02:52:43 tebow1 kernel: Lustre: 6580:0:(filter_io_26.c: 
698:filter_commitrw_write()) crn-OST0005: slow i_mutex 88s
Jun 25 02:52:43 tebow1 kernel: Lustre: 5651:0:(filter_io_26.c: 
763:filter_commitrw_write()) crn-OST0005: slow direct_io 83s
Jun 25 02:52:43 tebow1 kernel: Lustre: 869:0:(filter_io_26.c: 
776:filter_commitrw_write()) crn-OST0005: slow commitrw commit 88s
Jun 25 02:52:43 tebow1 kernel: Lustre: 30543:0:(lustre_fsfilt.h: 
296:fsfilt_commit_wait()) crn-OST0005: slow journal start 88s
Jun 25 02:52:43 tebow1 kernel: Lustre: 5654:0:(lustre_fsfilt.h: 
205:fsfilt_start_log()) crn-OST0005: slow journal start 68s
Jun 25 03:17:31 tebow1 kernel: Lustre: 7681:0:(filter_io_26.c: 
698:filter_commitrw_write()) crn-OST000b: slow i_mutex 52s
Jun 25 03:17:31 tebow1 kernel: Lustre: 6598:0:(lustre_fsfilt.h: 
240:fsfilt_brw_start_log()) crn-OST000b: slow journal start 31s
Jun 25 03:17:31 tebow1 kernel: Lustre: 6598:0:(filter_io_26.c: 
711:filter_commitrw_write()) crn-OST000b: slow brw_start 31s
Jun 25 03:17:31 tebow1 kernel: Lustre: 6598:0:(filter_io_26.c: 
763:filter_commitrw_write()) crn-OST000b: slow direct_io 31s
Jun 25 03:17:31 tebow1 kernel: Lustre: 2004:0:(filter_io_26.c: 
698:filter_commitrw_write()) crn-OST000b: slow i_mutex 52s
Jun 25 03:17:31 tebow1 kernel: Lustre: 2004:0:(filter_io_26.c: 
711:filter_commitrw_write()) crn-OST000b: slow brw_start 52s
Jun 25 03:17:31 tebow1 kernel: Lustre: 1633:0:(filter_io_26.c: 
776:filter_commitrw_write()) crn-OST000b: slow commitrw commit 52s
Jun 25 03:19:11 tebow1 kernel:  [<ffffffff800615a6>]  
__mutex_lock_slowpath+0x55/0x90
Jun 25 03:19:12 tebow1 kernel:  [<ffffffff800615a6>]  
__mutex_lock_slowpath+0x55/0x90
Jun 25 03:19:12 tebow1 kernel:  [<ffffffff800615a6>]  
__mutex_lock_slowpath+0x55/0x90
Jun 25 03:19:12 tebow1 kernel:  [<ffffffff800615a6>]  
__mutex_lock_slowpath+0x55/0x90
Jun 25 03:19:22 tebow1 kernel:  [<ffffffff800615a6>]  
__mutex_lock_slowpath+0x55/0x90
Jun 25 03:20:19 tebow1 kernel: Lustre: 7897:0:(lustre_fsfilt.h: 
240:fsfilt_brw_start_log()) crn-OST000b: slow journal start 129s
Jun 25 03:20:19 tebow1 kernel: Lustre: 4272:0:(filter_io_26.c: 
698:filter_commitrw_write()) crn-OST000b: slow i_mutex 168s
Jun 25 03:20:19 tebow1 kernel: Lustre: 17835:0:(filter_io_26.c: 
711:filter_commitrw_write()) crn-OST000b: slow brw_start 154s
Jun 25 03:20:19 tebow1 kernel: Lustre: 869:0:(lustre_fsfilt.h: 
205:fsfilt_start_log()) crn-OST000b: slow journal start 133s
Jun 25 03:20:20 tebow1 kernel: Lustre: 7897:0:(filter_io_26.c: 
763:filter_commitrw_write()) crn-OST000b: slow direct_io 129s
Jun 25 03:20:20 tebow1 kernel: Lustre: 14215:0:(lustre_fsfilt.h: 
296:fsfilt_commit_wait()) crn-OST000b: slow journal start 157s
Jun 25 03:20:20 tebow1 kernel: Lustre: 6617:0:(filter_io_26.c: 
776:filter_commitrw_write()) crn-OST000b: slow commitrw commit 168s
Jun 25 05:51:53 tebow1 kernel: Lustre: 6625:0:(filter_io_26.c: 
698:filter_commitrw_write()) crn-OST0009: slow i_mutex 54s
Jun 25 05:51:53 tebow1 kernel: Lustre: 6625:0:(filter_io_26.c: 
711:filter_commitrw_write()) crn-OST0009: slow brw_start 54s
Jun 25 05:51:53 tebow1 kernel: Lustre: 31170:0:(filter_io_26.c: 
763:filter_commitrw_write()) crn-OST0009: slow direct_io 54s
Jun 25 05:51:53 tebow1 kernel: Lustre: 31176:0:(filter_io_26.c: 
776:filter_commitrw_write()) crn-OST0009: slow commitrw commit 54s
Jun 25 05:51:53 tebow1 kernel: Lustre: 11514:0:(lustre_fsfilt.h: 
296:fsfilt_commit_wait()) crn-OST0009: slow journal start 54s
Jun 25 05:52:38 tebow1 kernel: Lustre: 6600:0:(lustre_fsfilt.h: 
240:fsfilt_brw_start_log()) crn-OST0009: slow journal start 32s
Jun 25 05:52:38 tebow1 kernel: Lustre: 6600:0:(filter_io_26.c: 
711:filter_commitrw_write()) crn-OST0009: slow brw_start 32s
Jun 25 05:52:38 tebow1 kernel: Lustre: 4262:0:(filter_io_26.c: 
698:filter_commitrw_write()) crn-OST0009: slow i_mutex 43s
Jun 25 05:52:38 tebow1 kernel: Lustre: 6581:0:(filter_io_26.c: 
763:filter_commitrw_write()) crn-OST0009: slow direct_io 43s
Jun 25 05:52:38 tebow1 kernel: Lustre: 6581:0:(filter_io_26.c: 
776:filter_commitrw_write()) crn-OST0009: slow commitrw commit 43s
Jun 25 05:52:38 tebow1 kernel: Lustre: 18886:0:(lustre_fsfilt.h: 
296:fsfilt_commit_wait()) crn-OST0009: slow journal start 43s
Jun 25 14:43:31 tebow1 kernel: Lustre: 18335:0:(lustre_fsfilt.h: 
306:fsfilt_setattr()) crn-OST000c: slow setattr 34s
Jun 25 14:43:40 tebow1 kernel: Lustre: 1959:0:(lustre_fsfilt.h: 
306:fsfilt_setattr()) crn-OST000b: slow setattr 46s
Jun 26 04:00:31 tebow1 kernel: Lustre: 6525:0:(lustre_fsfilt.h: 
240:fsfilt_brw_start_log()) crn-OST0003: slow journal start 36s
Jun 26 04:00:31 tebow1 kernel: Lustre: 29138:0:(filter_io_26.c: 
698:filter_commitrw_write()) crn-OST0003: slow i_mutex 40s
Jun 26 04:00:31 tebow1 kernel: Lustre: 29138:0:(filter_io_26.c: 
711:filter_commitrw_write()) crn-OST0003: slow brw_start 40s
Jun 26 04:00:31 tebow1 kernel: Lustre: 6525:0:(filter_io_26.c: 
763:filter_commitrw_write()) crn-OST0003: slow direct_io 36s
Jun 26 04:00:31 tebow1 kernel: Lustre: 24264:0:(filter_io_26.c: 
776:filter_commitrw_write()) crn-OST0003: slow commitrw commit 40s
Jun 26 04:01:35 tebow1 kernel: Lustre: 885:0:(filter_io_26.c: 
698:filter_commitrw_write()) crn-OST0003: slow i_mutex 41s
Jun 26 04:01:35 tebow1 kernel: Lustre: 31901:0:(lustre_fsfilt.h: 
240:fsfilt_brw_start_log()) crn-OST0003: slow journal start 30s
Jun 26 04:01:35 tebow1 kernel: Lustre: 31901:0:(filter_io_26.c: 
711:filter_commitrw_write()) crn-OST0003: slow brw_start 30s
Jun 26 04:01:35 tebow1 kernel: Lustre: 24269:0:(lustre_fsfilt.h: 
296:fsfilt_commit_wait()) crn-OST0003: slow journal start 40s
Jun 26 04:01:35 tebow1 kernel: Lustre: 24269:0:(filter_io_26.c: 
776:filter_commitrw_write()) crn-OST0003: slow commitrw commit 40s
Jun 26 04:01:35 tebow1 kernel: Lustre: 31901:0:(filter_io_26.c: 
763:filter_commitrw_write()) crn-OST0003: slow direct_io 30s
Jun 26 04:02:06 tebow1 kernel: Lustre: 884:0:(filter_io_26.c: 
763:filter_commitrw_write()) crn-OST0003: slow direct_io 31s
Jun 26 04:02:06 tebow1 kernel: Lustre: 32337:0:(filter_io_26.c: 
776:filter_commitrw_write()) crn-OST0003: slow commitrw commit 31s
Jun 26 04:04:22 tebow1 kernel: Lustre: 6617:0:(filter_io_26.c: 
763:filter_commitrw_write()) crn-OST0003: slow direct_io 30s
Jun 26 04:04:22 tebow1 kernel: Lustre: 6617:0:(filter_io_26.c: 
776:filter_commitrw_write()) crn-OST0003: slow commitrw commit 30s
Jun 26 04:06:04 tebow1 kernel:  [<ffffffff800615ae>]  
__mutex_lock_slowpath+0x5d/0x90
Jun 26 04:06:04 tebow1 kernel:  [<ffffffff800615a6>]  
__mutex_lock_slowpath+0x55/0x90
Jun 26 04:06:04 tebow1 kernel:  [<ffffffff800615a6>]  
__mutex_lock_slowpath+0x55/0x90
Jun 26 04:06:04 tebow1 kernel:  [<ffffffff800615a6>]  
__mutex_lock_slowpath+0x55/0x90
Jun 26 04:06:04 tebow1 kernel:  [<ffffffff800615a6>]  
__mutex_lock_slowpath+0x55/0x90
Jun 26 04:06:04 tebow1 kernel:  [<ffffffff800615a6>]  
__mutex_lock_slowpath+0x55/0x90
Jun 26 04:06:04 tebow1 kernel:  [<ffffffff800615a6>]  
__mutex_lock_slowpath+0x55/0x90
Jun 26 04:06:05 tebow1 kernel:  [<ffffffff800615a6>]  
__mutex_lock_slowpath+0x55/0x90
Jun 26 04:06:05 tebow1 kernel:  [<ffffffff800615a6>]  
__mutex_lock_slowpath+0x55/0x90
Jun 26 04:06:44 tebow1 kernel: Lustre: 31167:0:(filter_io_26.c: 
698:filter_commitrw_write()) crn-OST0003: slow i_mutex 140s
Jun 26 04:06:44 tebow1 kernel: Lustre: 7680:0:(lustre_fsfilt.h: 
240:fsfilt_brw_start_log()) crn-OST0003: slow journal start 119s
Jun 26 04:06:44 tebow1 kernel: Lustre: 7680:0:(filter_io_26.c: 
711:filter_commitrw_write()) crn-OST0003: slow brw_start 119s
Jun 26 04:06:44 tebow1 kernel: Lustre: 13743:0:(lustre_fsfilt.h: 
205:fsfilt_start_log()) crn-OST0003: slow journal start 61s
Jun 26 04:06:44 tebow1 kernel: Lustre: 7680:0:(filter_io_26.c: 
763:filter_commitrw_write()) crn-OST0003: slow direct_io 119s
Jun 26 04:06:44 tebow1 kernel: Lustre: 14979:0:(filter_io_26.c: 
776:filter_commitrw_write()) crn-OST0003: slow commitrw commit 140s
Jun 26 04:08:24 tebow1 kernel:  [<ffffffff800615a6>]  
__mutex_lock_slowpath+0x55/0x90
Jun 26 04:08:24 tebow1 kernel:  [<ffffffff800615a6>]  
__mutex_lock_slowpath+0x55/0x90
Jun 26 04:08:24 tebow1 kernel:  [<ffffffff800615a6>]  
__mutex_lock_slowpath+0x55/0x90
Jun 26 04:08:24 tebow1 kernel:  [<ffffffff800615a6>]  
__mutex_lock_slowpath+0x55/0x90
Jun 26 04:08:24 tebow1 kernel:  [<ffffffff800615a6>]  
__mutex_lock_slowpath+0x55/0x90
Jun 26 04:08:24 tebow1 kernel:  [<ffffffff800615a6>]  
__mutex_lock_slowpath+0x55/0x90
Jun 26 04:08:24 tebow1 kernel:  [<ffffffff800615a6>]  
__mutex_lock_slowpath+0x55/0x90
Jun 26 04:11:37 tebow1 kernel: Lustre: 18876:0:(filter_io_26.c: 
698:filter_commitrw_write()) crn-OST0003: slow i_mutex 293s
Jun 26 04:11:37 tebow1 kernel: Lustre: 7005:0:(lustre_fsfilt.h: 
240:fsfilt_brw_start_log()) crn-OST0003: slow journal start 260s
Jun 26 04:11:37 tebow1 kernel: Lustre: 7005:0:(filter_io_26.c: 
711:filter_commitrw_write()) crn-OST0003: slow brw_start 260s
Jun 26 04:11:37 tebow1 kernel: Lustre: 29130:0:(filter_io_26.c: 
763:filter_commitrw_write()) crn-OST0003: slow direct_io 263s
Jun 26 04:11:37 tebow1 kernel: Lustre: 6511:0:(filter_io_26.c: 
776:filter_commitrw_write()) crn-OST0003: slow commitrw commit 293s
Jun 26 04:14:20 tebow1 kernel: Lustre: 2704:0:(filter_io_26.c: 
698:filter_commitrw_write()) crn-OST0003: slow i_mutex 44s
Jun 26 04:14:20 tebow1 kernel: Lustre: 2704:0:(filter_io_26.c: 
711:filter_commitrw_write()) crn-OST0003: slow brw_start 44s
Jun 26 04:14:20 tebow1 kernel: Lustre: 31905:0:(filter_io_26.c: 
763:filter_commitrw_write()) crn-OST0003: slow direct_io 44s
Jun 26 04:14:20 tebow1 kernel: Lustre: 31905:0:(filter_io_26.c: 
776:filter_commitrw_write()) crn-OST0003: slow commitrw commit 44s
Jun 26 11:07:40 tebow1 kernel: Lustre: 6610:0:(filter_io_26.c: 
763:filter_commitrw_write()) crn-OST0003: slow direct_io 47s
Jun 26 11:07:40 tebow1 kernel: Lustre: 6610:0:(filter_io_26.c: 
776:filter_commitrw_write()) crn-OST0003: slow commitrw commit 47s
Jun 26 11:07:44 tebow1 kernel: Lustre: 6506:0:(filter_io_26.c: 
698:filter_commitrw_write()) crn-OST0003: slow i_mutex 31s
Jun 26 11:07:45 tebow1 kernel: Lustre: 885:0:(filter_io_26.c: 
763:filter_commitrw_write()) crn-OST0003: slow direct_io 31s
Jun 26 11:07:45 tebow1 kernel: Lustre: 885:0:(filter_io_26.c: 
776:filter_commitrw_write()) crn-OST0003: slow commitrw commit 31s
Jun 26 11:07:45 tebow1 kernel: Lustre: 6506:0:(filter_io_26.c: 
711:filter_commitrw_write()) crn-OST0003: slow brw_start 31s
Jun 26 12:42:23 tebow1 kernel: Lustre: 14215:0:(lustre_fsfilt.h: 
306:fsfilt_setattr()) crn-OST0007: slow setattr 56s




More information about the lustre-discuss mailing list