hi  <br><br>what thiserror means<br><br>LustreError: 1545:0:(ldlm_lockd.c:584:ldlm_server_completion_ast()) ### enqueue wait took 114978964us from 1296131766 ns: filter-ost1_UUID lock: df9db940/0x64a022400922c49f lrc: 2/0,0 mode: PR/PR res: 27991312/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->8191) flags: 0 remote: 0x943ed9a07b21c6f7 expref: 2540 pid: 1649<br>

LustreError: 1545:0:(ldlm_lockd.c:584:ldlm_server_completion_ast()) ### enqueue wait took 117994395us from 1296131763 ns: filter-ost1_UUID lock: c4158240/0x64a022400922c43d lrc: 2/0,0 mode: PR/PR res: 27991312/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->8191) flags: 0 remote: 0xc4d09fb49f00db51 expref: 2519 pid: 1645<br>

LustreError: 1553:0:(ldlm_lockd.c:584:ldlm_server_completion_ast()) ### enqueue wait took 144547977us from 1296131797 ns: filter-ost1_UUID lock: f7f36a40/0x64a022400922ca2c lrc: 2/0,0 mode: PR/PR res: 28137827/0 rrc: 4 type: EXT [0->18446744073709551615] (req 4096->8191) flags: 0 remote: 0xc4d09fb49f045a40 expref: 2530 pid: 1636<br>

LustreError: 1541:0:(ldlm_lockd.c:584:ldlm_server_completion_ast()) ### enqueue wait took 104761600us from 1296131888 ns: filter-ost1_UUID lock: ee4ad040/0x64a022400922dd03 lrc: 2/0,0 mode: PR/PR res: 27991312/0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->8191) flags: 0 remote: 0x943ed9a07b303bd5 expref: 2555 pid: 1653<br>

LustreError: 1633:0:(ldlm_lib.c:557:target_handle_connect()) @@@ UUID 'ost2_UUID' is not available  for connect (no target)  req@d001ba00 x16180978/t0 o8-><?>@<?>:-1 lens 240/0 ref 0 fl Interpret:/0/0 rc 0/0<br>

LustreError: 1633:0:(ldlm_lib.c:1318:target_send_reply_msg()) @@@ processing error (-19)  req@d001ba00 x16180978/t0 o8-><?>@<?>:-1 lens 240/0 ref 0 fl Interpret:/0/0 rc -19/0<br>Lustre: 1689:0:(filter_io_26.c:714:filter_commitrw_write()) ost1: slow direct_io 32s<br>

Lustre: 1689:0:(filter_io_26.c:727:filter_commitrw_write()) ost1: slow commitrw commit 32s<br>Lustre: 1712:0:(filter_io_26.c:714:filter_commitrw_write()) ost1: slow direct_io 36s<br>Lustre: 1712:0:(filter_io_26.c:727:filter_commitrw_write()) ost1: slow commitrw commit 36s<br>

Lustre: 1695:0:(filter_io_26.c:714:filter_commitrw_write()) ost1: slow direct_io 46s<br>Lustre: 1695:0:(filter_io_26.c:727:filter_commitrw_write()) ost1: slow commitrw commit 46s<br><br><br><div class="gmail_quote">On Wed, Jan 26, 2011 at 11:53 PM, Brian J. Murrell <span dir="ltr"><<a href="mailto:brian@whamcloud.com">brian@whamcloud.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">On Wed, 2011-01-26 at 22:24 +0500, Nauman Yousuf wrote:<br>

><br>

<br>

Your logs don't have timestamps so it's difficult to correlate events<br>

but did you notice right before you started getting these messages:<br>

<br>

<br>

> Lustre: 1588:0:(lustre_fsfilt.h:283:fsfilt_setattr()) mds01: slow setattr 31s<br>

> Lustre: 1595:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow journal start 33s<br>

> Lustre: 1720:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow journal start 32s<br>

> Lustre: 1602:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow journal start 38s<br>

<br>

You got this:<br>

<br>

> drbd0: Resync started as SyncSource (need to sync 634747844 KB [158686961 bits set]).<br>

> drbd0: Resync done (total 97313 sec; paused 0 sec; 6520 K/sec)<br>

> drbd0: drbd0_worker [1126]: cstate SyncSource --> Connected<br>

<br>

I'm no DRBD expert by a long shot but that looks to me like you had a<br>

disk in the MDS re-syncing to it's DRBD partner.  If that disk is the<br>

MDT, a resync, of course is going to slow down the MDT.<br>

<br>

The problem here is that you are probably tuned (i.e. the number of<br>

threads) to expect to full performance out of the hardware and when it's<br>

under a resync load, it won't deliver it.<br>

<br>

Unfortunately at this point Lustre will push it's thread count higher if<br>

can determine it can get more performance out of a target but it won't<br>

back off when things slow down (i.e. because the disk is being<br>

commandeered for housekeeping tasks such as resync or raid rebuild,<br>

etc.), so you need to maximize your thread count to what performs well<br>

while your disks are under a resync load.<br>

<br>

Please see the operations manual for details on tuning thread counts for<br>

performance.<br>

<br>

Cheers,<br>

<font color="#888888">b.<br>

<br>

</font><br>_______________________________________________<br>

Lustre-discuss mailing list<br>

<a href="mailto:Lustre-discuss@lists.lustre.org">Lustre-discuss@lists.lustre.org</a><br>

<a href="http://lists.lustre.org/mailman/listinfo/lustre-discuss" target="_blank">http://lists.lustre.org/mailman/listinfo/lustre-discuss</a><br>

<br></blockquote></div><br><br clear="all"><br>-- <br>Regards<br><br>Nauman Yousuf<br>0321-2549206<br>E-Eager, N-Noble, G-Genuine, I-Intelligent, N-Natural, E-Enthusiastic, E-Energetic, R-Resourcefull --- ENGINEER<br>