[Lustre-discuss] Lustre Issue

Thu Jan 27 02:17:26 PST 2011

hey on lustre client i got this error .

LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 17 previous
similar messages
LustreError: 2208:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server
(nid 10.65.200.30 at tcp) out of sync -- not fatal, flags 332c90
LustreError: 2208:0:(ldlm_request.c:746:ldlm_cli_cancel()) Skipped 1
previous similar message
LustreError: 2208:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel
failed: 116
LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) @@@ type ==
PTL_RPC_MSG_ERR, err == -2  req at c229bc00 x1219552/t0
o4->ost2_UUID at cyclops_UUID:28 lens 328/288 ref 2 fl Rpc:R/0/0 rc 0/-2
LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 40 previous
similar messages
LustreError: 2188:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server
(nid 10.65.200.30 at tcp) out of sync -- not fatal, flags 332c90
LustreError: 2188:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel
failed: 116
LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) @@@ type ==
PTL_RPC_MSG_ERR, err == -2  req at c22a3a00 x1219666/t0
o4->ost2_UUID at cyclops_UUID:28 lens 328/288 ref 2 fl Rpc:R/0/0 rc 0/-2
LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 88 previous
similar messages
LustreError: 2231:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server
(nid 10.65.200.30 at tcp) out of sync -- not fatal, flags 332c90
LustreError: 2231:0:(ldlm_request.c:746:ldlm_cli_cancel()) Skipped 2
previous similar messages
LustreError: 2231:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel
failed: 116
LustreError: 2231:0:(file.c:754:ll_extent_lock_callback()) Skipped 2
previous similar messages

On Wed, Jan 26, 2011 at 11:53 PM, Brian J. Murrell <brian at whamcloud.com>wrote:

> On Wed, 2011-01-26 at 22:24 +0500, Nauman Yousuf wrote:
> >
>
> Your logs don't have timestamps so it's difficult to correlate events
> but did you notice right before you started getting these messages:
>
>
> > Lustre: 1588:0:(lustre_fsfilt.h:283:fsfilt_setattr()) mds01: slow setattr
> 31s
> > Lustre: 1595:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow
> journal start 33s
> > Lustre: 1720:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow
> journal start 32s
> > Lustre: 1602:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow
> journal start 38s
>
> You got this:
>
> > drbd0: Resync started as SyncSource (need to sync 634747844 KB [158686961
> bits set]).
> > drbd0: Resync done (total 97313 sec; paused 0 sec; 6520 K/sec)
> > drbd0: drbd0_worker [1126]: cstate SyncSource --> Connected
>
> I'm no DRBD expert by a long shot but that looks to me like you had a
> disk in the MDS re-syncing to it's DRBD partner.  If that disk is the
> MDT, a resync, of course is going to slow down the MDT.
>
> The problem here is that you are probably tuned (i.e. the number of
> threads) to expect to full performance out of the hardware and when it's
> under a resync load, it won't deliver it.
>
> Unfortunately at this point Lustre will push it's thread count higher if
> can determine it can get more performance out of a target but it won't
> back off when things slow down (i.e. because the disk is being
> commandeered for housekeeping tasks such as resync or raid rebuild,
> etc.), so you need to maximize your thread count to what performs well
> while your disks are under a resync load.
>
> Please see the operations manual for details on tuning thread counts for
> performance.
>
> Cheers,
> b.
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110127/a5358bf3/attachment.htm>