[Lustre-discuss] poor ost write performance.

Jeremy Filizetti jeremy.filizetti at gmail.com
Wed Apr 20 09:58:59 PDT 2011


I think we are in agreement here, but my point is adding a read cache to
Lustre (which is feature I rely heavily on) isn't the fundmental problem, it
just exposed it.  The fundamental problem is with ext[34] performance on
larger disks.

Jeremy

On Wed, Apr 20, 2011 at 8:57 AM, Kevin Van Maren <kevin.van.maren at oracle.com
> wrote:

> Yes, difficulty finding free disk space can also be a problem, but I
> could not recall big changes in
> how that worked since 1.6, other than memory pressure from the read
> cache pushing out the bitmaps.
> See http://jira.whamcloud.com/browse/LU-15
>
> Kevin
>
>
> James Rose wrote:
> > Hi Kevin,
> >
> > Thanks for the suggestion.  I will try this out.
> >
> > For the moment it seems that it may be disk space related.  I have
> > removed some data from the file system.  Performance returned to where I
> > would expect it to be as space freed up (currently at 83% full).  Since
> > free space I have seen two messages on an OSS where the number of
> > threads is tuned to the amount of RAM in the host and six on an OSS that
> > has the number of threads set higher than it should.  This is a much
> > better situation than the steady stream I was experiencing last night.
> > Maybe disabling the read cache will remove the last few.
> >
> > I am still very curious what the rapid small reads seen when writing are
> > as this showed up while mounted ldiskfs so not doing regular lustre
> > operations at all.
> >
> > Thanks again for your help,
> >
> > James.
> >
> >
> > On Wed, 2011-04-20 at 08:48 -0300, Kevin Van Maren wrote:
> >
> >> First guess is the increased memory pressure caused by the Lustre 1.8
> >> read cache.  Many times "slow" messages are caused by memory
> >> allocatons taking a long time.
> >>
> >> You could try disabling the read cache and see if that clears up the
> >> slow messages.
> >>
> >> Kevin
> >>
> >>
> >> On Apr 20, 2011, at 4:29 AM, James Rose <James.Rose at framestore.com>
> >> wrote:
> >>
> >>
> >>> Hi
> >>>
> >>> We have been experiencing degraded performance for a few days on a
> >>> fresh install of lustre 1.8.5 (on RHEL5 using sun ext4 rpms).  The
> >>> initial bulk load of the data will be fine but once in use for a
> >>> while writes become very slow to individual ost.  This will block io
> >>> for a few minutes and then carry on as normal.  The slow writes will
> >>> then move to another ost.  This can be seen in iostat and many slow
> >>> IO messages will be seen in the logs (example included)
> >>>
> >>> The osts are between 87 90 % full.  Not ideal but has not caused any
> >>> issues running 1.6.7.2 on the same hardware.
> >>>
> >>> The osts are RAID6 on external raid chassis (Infortrend).  Each ost
> >>> is 5.4T (small).  The server is Dual AMD (4 cores). 16G Ram. Qlogic
> >>> FC HBA.
> >>>
> >>> I mounted the osts as ldiskfs and tried a few write tests.  These
> >>> also show the same behaviour.
> >>>
> >>> While the write operation is blocked there will be hundreds of read
> >>> tps and a very small kb/s read from the raid but now writes.  As
> >>> soon as this completes writes will go through at a more expected
> >>> speed.
> >>>
> >>> Any idea what is going on?
> >>>
> >>> Many thanks
> >>>
> >>> James.
> >>>
> >>> Example error messages:
> >>>
> >>> Apr 20 04:53:04 oss5r-mgmt kernel: LustreError: dumping log to /tmp/
> >>> lustre-log.1303271584.3935
> >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow quota
> >>> init 286s due to heavy IO load
> >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal
> >>> start 39s due to heavy IO load
> >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 39 previous
> >>> similar messages
> >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow
> >>> brw_start 39s due to heavy IO load
> >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 38 previous
> >>> similar messages
> >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal
> >>> start 133s due to heavy IO load
> >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 44 previous
> >>> similar messages
> >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow
> >>> brw_start 133s due to heavy IO load
> >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 44 previous
> >>> similar messages
> >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal
> >>> start 236s due to heavy IO load
> >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow i_mutex
> >>> 40s due to heavy IO load
> >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 2 previous
> >>> similar messages
> >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 6 previous
> >>> similar messages
> >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow i_mutex
> >>> 277s due to heavy IO load
> >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow
> >>> direct_io 286s due to heavy IO load
> >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 3 previous
> >>> similar messages
> >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal
> >>> start 285s due to heavy IO load
> >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 1 previous
> >>> similar message
> >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow
> >>> commitrw commit 285s due to heavy IO load
> >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 1 previous
> >>> similar message
> >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow parent
> >>> lock 236s due to heavy IO load
> >>>
> >>>
> >>> _______________________________________________
> >>> Lustre-discuss mailing list
> >>> Lustre-discuss at lists.lustre.org
> >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >>>
> >> _______________________________________________
> >> Lustre-discuss mailing list
> >> Lustre-discuss at lists.lustre.org
> >> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >>
> >
> >
> >
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110420/6975f282/attachment.htm>


More information about the lustre-discuss mailing list