[Lustre-discuss] Lustre Issue

Wed Jan 26 10:53:52 PST 2011

On Wed, 2011-01-26 at 22:24 +0500, Nauman Yousuf wrote:
> 

Your logs don't have timestamps so it's difficult to correlate events
but did you notice right before you started getting these messages:

> Lustre: 1588:0:(lustre_fsfilt.h:283:fsfilt_setattr()) mds01: slow setattr 31s
> Lustre: 1595:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow journal start 33s
> Lustre: 1720:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow journal start 32s
> Lustre: 1602:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow journal start 38s

You got this:

> drbd0: Resync started as SyncSource (need to sync 634747844 KB [158686961 bits set]).
> drbd0: Resync done (total 97313 sec; paused 0 sec; 6520 K/sec)
> drbd0: drbd0_worker [1126]: cstate SyncSource --> Connected

I'm no DRBD expert by a long shot but that looks to me like you had a
disk in the MDS re-syncing to it's DRBD partner.  If that disk is the
MDT, a resync, of course is going to slow down the MDT.

The problem here is that you are probably tuned (i.e. the number of
threads) to expect to full performance out of the hardware and when it's
under a resync load, it won't deliver it.

Unfortunately at this point Lustre will push it's thread count higher if
can determine it can get more performance out of a target but it won't
back off when things slow down (i.e. because the disk is being
commandeered for housekeeping tasks such as resync or raid rebuild,
etc.), so you need to maximize your thread count to what performs well
while your disks are under a resync load.

Please see the operations manual for details on tuning thread counts for
performance.

Cheers,
b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110126/f719c7f2/attachment.pgp>