[Lustre-discuss] Slow warnings

Fri Aug 28 12:12:34 PDT 2009

On Aug 28, 2009, at 3:08 PM, Brian J. Murrell wrote:

> On Fri, 2009-08-28 at 15:00 -0400, Scott Atchley wrote:
>> Lustre: 4227:0:(filter_io_26.c:641:filter_commitrw_write()) lustre-
>> OST0000: slow i_mutex 30s
>> Lustre: 4222:0:(lustre_fsfilt.h:320:fsfilt_commit_wait()) lustre-
>> OST0000: slow journal start 30s
>> Lustre: 4222:0:(filter_io_26.c:724:filter_commitrw_write()) lustre-
>> OST0000: slow commitrw commit 30s
>> Lustre: 4242:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-
>> OST0000: slow direct_io 30s
>> Lustre: 4242:0:(filter_io_26.c:706:filter_commitrw_write()) Skipped 4
>> previous similar messages
>>
>> Should I be concerned or is this normal?
>
> It means that I/Os are completing more slowly that Lustre would like,
> which as you can guess means you are hammering the disk(s) too hard.
> Try reducing the number of OST threads.  Ideally you want those  
> messages
> to go away even when you are pushing the OSTs to capacity.  Ideally  
> you
> want just enough OST threads to push the disks to capacity but no  
> more.
> So measure, reduce, measure.  If the throughput is the same or better
> after reducing, reduce further and measure again.  Repeat until you  
> have
> found the sweet spot.
>
> Obdfilter-survey in the iokit automates this for you running many  
> tests
> at different thread counts letting you see where the sweet spot is
> without all the iterating.

Hi Brian,

Thanks for the description.

Since I am mainly testing for correctness of MXLND, I am not worried  
about hammering my test disk. I will keep this in mind in case I get a  
big fat RAID this Christmas. ;-)

Scott