[Lustre-discuss] Lustre Issue

Nauman Yousuf nauman.yousuf at gmail.com
Thu Jan 27 04:25:52 PST 2011


guys still issues, some how my client and OSS start getting CUP load when
this happens

oSS says

LustreError: 1538:0:(ldlm_lockd.c:1425:ldlm_cancel_handler()) operation 103
from 12345-10.65.200.37 at tcp with bad export cookie 14320354116280279937
LustreError: 1560:0:(ldlm_lockd.c:1425:ldlm_cancel_handler()) operation 103
from 12345-10.65.200.37 at tcp with bad export cookie 14320354116280279937
LustreError: 1714:0:(filter_io.c:532:filter_preprw_write()) ost2: trying to
BRW to non-existent file 28017031
LustreError: 1708:0:(filter_io.c:532:filter_preprw_write()) ost2: trying to
BRW to non-existent file 28017031
LustreError: 1717:0:(filter_io.c:532:filter_preprw_write()) ost2: trying to
BRW to non-existent file 28017040
LustreError: 1717:0:(filter_io.c:532:filter_preprw_write()) Skipped 10
previous similar messages
LustreError: 1700:0:(filter_io.c:532:filter_preprw_write()) ost2: trying to
BRW to non-existent file 28017174
LustreError: 1700:0:(filter_io.c:532:filter_preprw_write()) Skipped 5
previous similar messages
LustreError: 1688:0:(filter_io.c:532:filter_preprw_write()) ost2: trying to
BRW to non-existent file 28016970
LustreError: 1688:0:(filter_io.c:532:filter_preprw_write()) Skipped 12
previous similar messages
LustreError: 1697:0:(filter_io.c:532:filter_preprw_write()) ost2: trying to
BRW to non-existent file 28017244
LustreError: 1697:0:(filter_io.c:532:filter_preprw_write()) Skipped 17
previous similar messages
LustreError: 1709:0:(filter_io.c:532:filter_preprw_write()) ost2: trying to
BRW to non-existent file 28017244
LustreError: 1709:0:(filter_io.c:532:filter_preprw_write()) Skipped 48
previous similar messages
drbd1: [ll_ost_io_23/1690] sock_sendmsg time expired, ko = 4294967295
Lustre: 1689:0:(filter_io_26.c:714:filter_commitrw_write()) ost2: slow
direct_io 30s
Lustre: 1689:0:(filter_io_26.c:727:filter_commitrw_write()) ost2: slow
commitrw commit 30s

10.65.200.37 is my lustre client

LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 4 previous
similar messages
LustreError: 2199:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel
failed: 116
LustreError: 2199:0:(file.c:754:ll_extent_lock_callback()) Skipped 3
previous similar messages
LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) @@@ type ==
PTL_RPC_MSG_ERR, err == -2  req at c229d200 x1219484/t0
o4->ost2_UUID at cyclops_UUID:28 lens 328/288 ref 2 fl Rpc:R/0/0 rc 0/-2
LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 17 previous
similar messages
LustreError: 2208:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server
(nid 10.65.200.30 at tcp) out of sync -- not fatal, flags 332c90
LustreError: 2208:0:(ldlm_request.c:746:ldlm_cli_cancel()) Skipped 1
previous similar message
LustreError: 2208:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel
failed: 116
LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) @@@ type ==
PTL_RPC_MSG_ERR, err == -2  req at c229bc00 x1219552/t0
o4->ost2_UUID at cyclops_UUID:28 lens 328/288 ref 2 fl Rpc:R/0/0 rc 0/-2
LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 40 previous
similar messages
LustreError: 2188:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server
(nid 10.65.200.30 at tcp) out of sync -- not fatal, flags 332c90
LustreError: 2188:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel
failed: 116
LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) @@@ type ==
PTL_RPC_MSG_ERR, err == -2  req at c22a3a00 x1219666/t0
o4->ost2_UUID at cyclops_UUID:28 lens 328/288 ref 2 fl Rpc:R/0/0 rc 0/-2
LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 88 previous
similar messages
LustreError: 2231:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server
(nid 10.65.200.30 at tcp) out of sync -- not fatal, flags 332c90
LustreError: 2231:0:(ldlm_request.c:746:ldlm_cli_cancel()) Skipped 2
previous similar messages
LustreError: 2231:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel
failed: 116
LustreError: 2231:0:(file.c:754:ll_extent_lock_callback()) Skipped 2
previous similar messages

10.65.200.30 is my OSS both are generating load..




On Thu, Jan 27, 2011 at 3:17 PM, Nauman Yousuf <nauman.yousuf at gmail.com>wrote:

> hey on lustre client i got this error .
>
>
> LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 17
> previous similar messages
> LustreError: 2208:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server
> (nid 10.65.200.30 at tcp) out of sync -- not fatal, flags 332c90
> LustreError: 2208:0:(ldlm_request.c:746:ldlm_cli_cancel()) Skipped 1
> previous similar message
> LustreError: 2208:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel
> failed: 116
> LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) @@@ type ==
> PTL_RPC_MSG_ERR, err == -2  req at c229bc00 x1219552/t0
> o4->ost2_UUID at cyclops_UUID:28 lens 328/288 ref 2 fl Rpc:R/0/0 rc 0/-2
> LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 40
> previous similar messages
> LustreError: 2188:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server
> (nid 10.65.200.30 at tcp) out of sync -- not fatal, flags 332c90
> LustreError: 2188:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel
> failed: 116
> LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) @@@ type ==
> PTL_RPC_MSG_ERR, err == -2  req at c22a3a00 x1219666/t0
> o4->ost2_UUID at cyclops_UUID:28 lens 328/288 ref 2 fl Rpc:R/0/0 rc 0/-2
> LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 88
> previous similar messages
> LustreError: 2231:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server
> (nid 10.65.200.30 at tcp) out of sync -- not fatal, flags 332c90
> LustreError: 2231:0:(ldlm_request.c:746:ldlm_cli_cancel()) Skipped 2
> previous similar messages
> LustreError: 2231:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel
> failed: 116
> LustreError: 2231:0:(file.c:754:ll_extent_lock_callback()) Skipped 2
> previous similar messages
>
>
> On Wed, Jan 26, 2011 at 11:53 PM, Brian J. Murrell <brian at whamcloud.com>wrote:
>
>> On Wed, 2011-01-26 at 22:24 +0500, Nauman Yousuf wrote:
>> >
>>
>> Your logs don't have timestamps so it's difficult to correlate events
>> but did you notice right before you started getting these messages:
>>
>>
>> > Lustre: 1588:0:(lustre_fsfilt.h:283:fsfilt_setattr()) mds01: slow
>> setattr 31s
>> > Lustre: 1595:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow
>> journal start 33s
>> > Lustre: 1720:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow
>> journal start 32s
>> > Lustre: 1602:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow
>> journal start 38s
>>
>> You got this:
>>
>> > drbd0: Resync started as SyncSource (need to sync 634747844 KB
>> [158686961 bits set]).
>> > drbd0: Resync done (total 97313 sec; paused 0 sec; 6520 K/sec)
>> > drbd0: drbd0_worker [1126]: cstate SyncSource --> Connected
>>
>> I'm no DRBD expert by a long shot but that looks to me like you had a
>> disk in the MDS re-syncing to it's DRBD partner.  If that disk is the
>> MDT, a resync, of course is going to slow down the MDT.
>>
>> The problem here is that you are probably tuned (i.e. the number of
>> threads) to expect to full performance out of the hardware and when it's
>> under a resync load, it won't deliver it.
>>
>> Unfortunately at this point Lustre will push it's thread count higher if
>> can determine it can get more performance out of a target but it won't
>> back off when things slow down (i.e. because the disk is being
>> commandeered for housekeeping tasks such as resync or raid rebuild,
>> etc.), so you need to maximize your thread count to what performs well
>> while your disks are under a resync load.
>>
>> Please see the operations manual for details on tuning thread counts for
>> performance.
>>
>> Cheers,
>> b.
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>
>
>
>


-- 
Regards

Nauman Yousuf
0321-2549206
E-Eager, N-Noble, G-Genuine, I-Intelligent, N-Natural, E-Enthusiastic,
E-Energetic, R-Resourcefull --- ENGINEER
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110127/4cebddde/attachment.htm>


More information about the lustre-discuss mailing list