[Lustre-discuss] slow direct_io , slow journal .. in OST log

Erik Froese erik.froese at gmail.com
Mon Jan 25 07:52:08 PST 2010


Is each OST journals on its own physical disk? I've seen those messages when
there isn't enough hardware dedicated to the journal device.
Erik

On Sun, Jan 24, 2010 at 11:43 PM, Aaron Knister <aaron.knister at gmail.com>wrote:

> I don't necessarily think there's anything wrong with using drbd or running
> it over gigabit ethernet. If you stop all I/O to the lustre filesystem, what
> does an hdparm -t show on the sdc and drbd devices? Do you have any
> performance numbers for the drbd or underlying raid devices?
>
> On Jan 24, 2010, at 11:17 PM, Lex wrote:
>
> Thank you for your fast reply, Aaron
>
> I'm using Giga Ethernet to synchronize data between to our fail-over node.
> Is there something wrong ? Tell me, please
>
> On Mon, Jan 25, 2010 at 10:35 AM, Aaron Knister <aaron.knister at gmail.com>wrote:
>
>> My best guess (and please correct me if I'm wrong) is that those messages
>> are because the underlying block devices are slow to respond to i/o
>> requests. It looks like you're using DRBD. What's your interconnect?
>>
>> On Jan 24, 2010, at 9:42 PM, Lex wrote:
>>
>> Hi list
>>
>> I have one OSS with hadware info like this :
>>
>> CPU Intel(R) xeon E5420 2.5 Ghz
>> Chipset intel 5000P
>> 8GB RAM
>>
>> With this OSS, we using 2 RAID-5 arrays as OSTs ( each has 4 x 1.5 TB hard
>> drive with RAID controller adaptec 5805 )
>>
>> I worked quite smooth before, but, about 2 weeks ago, in
>> /var/log/messages, i saw many warning ( i thought so)  like this:
>>
>> *Jan 25 08:41:23 OST6 kernel: Lustre:
>> 9587:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow
>> direct_io 35s
>> Jan 25 08:41:34 OST6 kernel: Lustre:
>> 9608:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow
>> direct_io 41s
>> Jan 25 08:41:34 OST6 kernel: Lustre:
>> 9608:0:(filter_io_26.c:706:filter_commitrw_write()) Skipped 2 previous
>> similar messages
>> Jan 25 08:41:35 OST6 kernel: Lustre:
>> 9645:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow
>> direct_io 43s
>> Jan 25 08:58:10 OST6 kernel: Lustre:
>> 9646:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow
>> direct_io 31s
>> Jan 25 08:59:39 OST6 kernel: Lustre:
>> 9609:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow
>> direct_io 30s
>> Jan 25 09:01:05 OST6 kernel: Lustre:
>> 9587:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow
>> direct_io 33s
>> Jan 25 09:03:23 OST6 kernel: Lustre:
>> 9633:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow
>> direct_io 32s
>> Jan 25 09:11:25 OST6 kernel: Lustre:
>> 9585:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow
>> direct_io 36s*
>>
>> I googled around and found that it's because a problem with
>> oss_num_threads and even though brought it down to 64 ( followed by the
>> function i found in the 1.8 manual: thread_number = RAM * CPU core / 128 MB,
>> its value is 256  )
>>
>> *options ost oss_num_threads=64*
>>
>> It still didn't help.
>>
>> I thought it was only the harmless warning but maybe wrong, our
>> performance is goes down quite heavily ( it's maybe because of other reason,
>> but for now, i am only doubting slow direct_io problem )
>>
>> iostat -m 1 1
>> Linux 2.6.18-92.1.17.el5_lustre.1.8.0custom (OST6)      01/25/2010
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>            0.01    0.02    2.86   25.01    0.00   72.10
>>
>> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
>> sda               1.30         0.01         0.00      11386       3469
>> sdb               1.30         0.01         0.00      11531       3469
>> sdc             131.50        *12.40*         0.26   11793218     249934
>> sdd             178.46        *18.00*         0.26   17124065     250334
>> md2               3.33         0.02         0.00      22915       2634
>> md1               0.00         0.00         0.00          0          0
>> md0               0.00         0.00         0.00          0          0
>> drbd3           480.10        *12.39*         0.26   11789047     249639
>> drbd6           565.85        *14.89*         0.26   14168452     249211
>>
>>
>> So, could anyone please tell me whether it's warning impact our system
>> performance or not ? and if it does, give me solution or advice to resolve
>> it, please
>>
>> Best regards
>>
>>
>>
>>
>>
>>
>>  _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100125/9d563237/attachment.htm>


More information about the lustre-discuss mailing list