<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">My best guess (and please correct me if I'm wrong) is that those messages are because the underlying block devices are slow to respond to i/o requests. It looks like you're using DRBD. What's your interconnect? <div><br><div><div>On Jan 24, 2010, at 9:42 PM, Lex wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">Hi list <br><br>I have one OSS with hadware info like this : <br><br>CPU Intel(R) xeon E5420 2.5 Ghz<br>Chipset intel 5000P <br>8GB RAM <br><br>With this OSS, we using 2 RAID-5 arrays as OSTs ( each has 4 x 1.5 TB hard drive with RAID controller adaptec 5805 ) <br>
<br>I worked quite smooth before, but, about 2 weeks ago, in /var/log/messages, i saw many warning ( i thought so) like this: <br><br><i>Jan 25 08:41:23 OST6 kernel: Lustre: 9587:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow direct_io 35s<br>
Jan 25 08:41:34 OST6 kernel: Lustre: 9608:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow direct_io 41s<br>Jan 25 08:41:34 OST6 kernel: Lustre: 9608:0:(filter_io_26.c:706:filter_commitrw_write()) Skipped 2 previous similar messages<br>
Jan 25 08:41:35 OST6 kernel: Lustre: 9645:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow direct_io 43s<br>Jan 25 08:58:10 OST6 kernel: Lustre: 9646:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow direct_io 31s<br>
Jan 25 08:59:39 OST6 kernel: Lustre: 9609:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow direct_io 30s<br>Jan 25 09:01:05 OST6 kernel: Lustre: 9587:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow direct_io 33s<br>
Jan 25 09:03:23 OST6 kernel: Lustre: 9633:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow direct_io 32s<br>Jan 25 09:11:25 OST6 kernel: Lustre: 9585:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow direct_io 36s</i><br>
<br>I googled around and found that it's because a problem with oss_num_threads and even though brought it down to 64 ( followed by the function i found in the 1.8 manual: thread_number = RAM * CPU core / 128 MB, its value is 256 ) <br>
<br><i>options ost oss_num_threads=64</i><br><br>It still didn't help. <br><br>I thought it was only the harmless warning but maybe wrong, our performance is goes down quite heavily ( it's maybe because of other reason, but for now, i am only doubting slow direct_io problem ) <br>
<br>iostat -m 1 1<br>Linux 2.6.18-92.1.17.el5_lustre.1.8.0custom (OST6) 01/25/2010<br><br>avg-cpu: %user %nice %system %iowait %steal %idle<br> 0.01 0.02 2.86 25.01 0.00 72.10<br><br>Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn<br>
sda 1.30 0.01 0.00 11386 3469<br>sdb 1.30 0.01 0.00 11531 3469<br>sdc 131.50 <b>12.40</b> 0.26 11793218 249934<br>
sdd 178.46 <b>18.00</b> 0.26 17124065 250334<br>md2 3.33 0.02 0.00 22915 2634<br>md1 0.00 0.00 0.00 0 0<br>
md0 0.00 0.00 0.00 0 0<br>drbd3 480.10 <b>12.39</b> 0.26 11789047 249639<br>drbd6 565.85 <b>14.89</b> 0.26 14168452 249211<br>
<br><br>So, could anyone please tell me whether it's warning impact our system performance or not ? and if it does, give me solution or advice to resolve it, please <br><br>Best regards <br><br><br><br><br><br><br>
_______________________________________________<br>Lustre-discuss mailing list<br><a href="mailto:Lustre-discuss@lists.lustre.org">Lustre-discuss@lists.lustre.org</a><br>http://lists.lustre.org/mailman/listinfo/lustre-discuss<br></blockquote></div><br></div></body></html>