<br>guys still issues, some how my client and OSS start getting CUP load when this happens <br><br>oSS says<br><br>LustreError: 1538:0:(ldlm_lockd.c:1425:ldlm_cancel_handler()) operation 103 from 12345-10.65.200.37@tcp with bad export cookie 14320354116280279937<br>

LustreError: 1560:0:(ldlm_lockd.c:1425:ldlm_cancel_handler()) operation 103 from 12345-10.65.200.37@tcp with bad export cookie 14320354116280279937<br>LustreError: 1714:0:(filter_io.c:532:filter_preprw_write()) ost2: trying to BRW to non-existent file 28017031<br>

LustreError: 1708:0:(filter_io.c:532:filter_preprw_write()) ost2: trying to BRW to non-existent file 28017031<br>LustreError: 1717:0:(filter_io.c:532:filter_preprw_write()) ost2: trying to BRW to non-existent file 28017040<br>

LustreError: 1717:0:(filter_io.c:532:filter_preprw_write()) Skipped 10 previous similar messages<br>LustreError: 1700:0:(filter_io.c:532:filter_preprw_write()) ost2: trying to BRW to non-existent file 28017174<br>LustreError: 1700:0:(filter_io.c:532:filter_preprw_write()) Skipped 5 previous similar messages<br>

LustreError: 1688:0:(filter_io.c:532:filter_preprw_write()) ost2: trying to BRW to non-existent file 28016970<br>LustreError: 1688:0:(filter_io.c:532:filter_preprw_write()) Skipped 12 previous similar messages<br>LustreError: 1697:0:(filter_io.c:532:filter_preprw_write()) ost2: trying to BRW to non-existent file 28017244<br>

LustreError: 1697:0:(filter_io.c:532:filter_preprw_write()) Skipped 17 previous similar messages<br>LustreError: 1709:0:(filter_io.c:532:filter_preprw_write()) ost2: trying to BRW to non-existent file 28017244<br>LustreError: 1709:0:(filter_io.c:532:filter_preprw_write()) Skipped 48 previous similar messages<br>

drbd1: [ll_ost_io_23/1690] sock_sendmsg time expired, ko = 4294967295<br>Lustre: 1689:0:(filter_io_26.c:714:filter_commitrw_write()) ost2: slow direct_io 30s<br>Lustre: 1689:0:(filter_io_26.c:727:filter_commitrw_write()) ost2: slow commitrw commit 30s<br>

<br>10.65.200.37 is my lustre client <br><br>LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 4 previous similar messages<br>LustreError: 2199:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel failed: 116<br>

LustreError: 2199:0:(file.c:754:ll_extent_lock_callback()) Skipped 3 previous similar messages<br>LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -2  req@c229d200 x1219484/t0 o4->ost2_UUID@cyclops_UUID:28 lens 328/288 ref 2 fl Rpc:R/0/0 rc 0/-2<br>

LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 17 previous similar messages<br>LustreError: 2208:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server (nid 10.65.200.30@tcp) out of sync -- not fatal, flags 332c90<br>

LustreError: 2208:0:(ldlm_request.c:746:ldlm_cli_cancel()) Skipped 1 previous similar message<br>LustreError: 2208:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel failed: 116<br>LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -2  req@c229bc00 x1219552/t0 o4->ost2_UUID@cyclops_UUID:28 lens 328/288 ref 2 fl Rpc:R/0/0 rc 0/-2<br>

LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 40 previous similar messages<br>LustreError: 2188:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server (nid 10.65.200.30@tcp) out of sync -- not fatal, flags 332c90<br>

LustreError: 2188:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel failed: 116<br>LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -2  req@c22a3a00 x1219666/t0 o4->ost2_UUID@cyclops_UUID:28 lens 328/288 ref 2 fl Rpc:R/0/0 rc 0/-2<br>

LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 88 previous similar messages<br>LustreError: 2231:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server (nid 10.65.200.30@tcp) out of sync -- not fatal, flags 332c90<br>

LustreError: 2231:0:(ldlm_request.c:746:ldlm_cli_cancel()) Skipped 2 previous similar messages<br>LustreError: 2231:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel failed: 116<br>LustreError: 2231:0:(file.c:754:ll_extent_lock_callback()) Skipped 2 previous similar messages<br>

<br>10.65.200.30 is my OSS both are generating load..<br><br><br><br><br><div class="gmail_quote">On Thu, Jan 27, 2011 at 3:17 PM, Nauman Yousuf <span dir="ltr"><<a href="mailto:nauman.yousuf@gmail.com">nauman.yousuf@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">hey on lustre client i got this error .<br><br><br>LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 17 previous similar messages<br>

LustreError: 2208:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server (nid 10.65.200.30@tcp) out of sync -- not fatal, flags 332c90<br>

LustreError: 2208:0:(ldlm_request.c:746:ldlm_cli_cancel()) Skipped 1 previous similar message<br>LustreError: 2208:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel failed: 116<br>LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -2  req@c229bc00 x1219552/t0 o4->ost2_UUID@cyclops_UUID:28 lens 328/288 ref 2 fl Rpc:R/0/0 rc 0/-2<br>


LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 40 previous similar messages<br>LustreError: 2188:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server (nid 10.65.200.30@tcp) out of sync -- not fatal, flags 332c90<br>


LustreError: 2188:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel failed: 116<br>LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -2  req@c22a3a00 x1219666/t0 o4->ost2_UUID@cyclops_UUID:28 lens 328/288 ref 2 fl Rpc:R/0/0 rc 0/-2<br>


LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 88 previous similar messages<br>LustreError: 2231:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server (nid 10.65.200.30@tcp) out of sync -- not fatal, flags 332c90<br>


LustreError: 2231:0:(ldlm_request.c:746:ldlm_cli_cancel()) Skipped 2 previous similar messages<br>LustreError: 2231:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel failed: 116<br>LustreError: 2231:0:(file.c:754:ll_extent_lock_callback()) Skipped 2 previous similar messages<br>


<br><br><div class="gmail_quote"><div><div></div><div class="h5">On Wed, Jan 26, 2011 at 11:53 PM, Brian J. Murrell <span dir="ltr"><<a href="mailto:brian@whamcloud.com" target="_blank">brian@whamcloud.com</a>></span> wrote:<br>

</div></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div><div></div><div class="h5">

On Wed, 2011-01-26 at 22:24 +0500, Nauman Yousuf wrote:<br>

><br>

<br>

Your logs don't have timestamps so it's difficult to correlate events<br>

but did you notice right before you started getting these messages:<br>

<br>

<br>

> Lustre: 1588:0:(lustre_fsfilt.h:283:fsfilt_setattr()) mds01: slow setattr 31s<br>

> Lustre: 1595:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow journal start 33s<br>

> Lustre: 1720:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow journal start 32s<br>

> Lustre: 1602:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow journal start 38s<br>

<br>

You got this:<br>

<br>

> drbd0: Resync started as SyncSource (need to sync 634747844 KB [158686961 bits set]).<br>

> drbd0: Resync done (total 97313 sec; paused 0 sec; 6520 K/sec)<br>

> drbd0: drbd0_worker [1126]: cstate SyncSource --> Connected<br>

<br>

I'm no DRBD expert by a long shot but that looks to me like you had a<br>

disk in the MDS re-syncing to it's DRBD partner.  If that disk is the<br>

MDT, a resync, of course is going to slow down the MDT.<br>

<br>

The problem here is that you are probably tuned (i.e. the number of<br>

threads) to expect to full performance out of the hardware and when it's<br>

under a resync load, it won't deliver it.<br>

<br>

Unfortunately at this point Lustre will push it's thread count higher if<br>

can determine it can get more performance out of a target but it won't<br>

back off when things slow down (i.e. because the disk is being<br>

commandeered for housekeeping tasks such as resync or raid rebuild,<br>

etc.), so you need to maximize your thread count to what performs well<br>

while your disks are under a resync load.<br>

<br>

Please see the operations manual for details on tuning thread counts for<br>

performance.<br>

<br>

Cheers,<br>

<font color="#888888">b.<br>

<br>

</font><br></div></div><div class="im">_______________________________________________<br>

Lustre-discuss mailing list<br>

<a href="mailto:Lustre-discuss@lists.lustre.org" target="_blank">Lustre-discuss@lists.lustre.org</a><br>

<a href="http://lists.lustre.org/mailman/listinfo/lustre-discuss" target="_blank">http://lists.lustre.org/mailman/listinfo/lustre-discuss</a><br>

<br></div></blockquote></div><br><br clear="all"><br><br>

</blockquote></div><br><br clear="all"><br>-- <br>Regards<br><br>Nauman Yousuf<br>0321-2549206<br>E-Eager, N-Noble, G-Genuine, I-Intelligent, N-Natural, E-Enthusiastic, E-Energetic, R-Resourcefull --- ENGINEER<br>