[Lustre-discuss] poor ost write performance.

James Rose James.Rose at framestore.com
Wed Apr 20 05:28:59 PDT 2011


Hi Kevin,

Thanks for the suggestion.  I will try this out.  

For the moment it seems that it may be disk space related.  I have
removed some data from the file system.  Performance returned to where I
would expect it to be as space freed up (currently at 83% full).  Since
free space I have seen two messages on an OSS where the number of
threads is tuned to the amount of RAM in the host and six on an OSS that
has the number of threads set higher than it should.  This is a much
better situation than the steady stream I was experiencing last night.
Maybe disabling the read cache will remove the last few.

I am still very curious what the rapid small reads seen when writing are
as this showed up while mounted ldiskfs so not doing regular lustre
operations at all.

Thanks again for your help,

James.


On Wed, 2011-04-20 at 08:48 -0300, Kevin Van Maren wrote:
> First guess is the increased memory pressure caused by the Lustre 1.8  
> read cache.  Many times "slow" messages are caused by memory  
> allocatons taking a long time.
> 
> You could try disabling the read cache and see if that clears up the  
> slow messages.
> 
> Kevin
> 
> 
> On Apr 20, 2011, at 4:29 AM, James Rose <James.Rose at framestore.com>  
> wrote:
> 
> > Hi
> >
> > We have been experiencing degraded performance for a few days on a  
> > fresh install of lustre 1.8.5 (on RHEL5 using sun ext4 rpms).  The  
> > initial bulk load of the data will be fine but once in use for a  
> > while writes become very slow to individual ost.  This will block io  
> > for a few minutes and then carry on as normal.  The slow writes will  
> > then move to another ost.  This can be seen in iostat and many slow  
> > IO messages will be seen in the logs (example included)
> >
> > The osts are between 87 90 % full.  Not ideal but has not caused any  
> > issues running 1.6.7.2 on the same hardware.
> >
> > The osts are RAID6 on external raid chassis (Infortrend).  Each ost  
> > is 5.4T (small).  The server is Dual AMD (4 cores). 16G Ram. Qlogic  
> > FC HBA.
> >
> > I mounted the osts as ldiskfs and tried a few write tests.  These  
> > also show the same behaviour.
> >
> > While the write operation is blocked there will be hundreds of read  
> > tps and a very small kb/s read from the raid but now writes.  As  
> > soon as this completes writes will go through at a more expected  
> > speed.
> >
> > Any idea what is going on?
> >
> > Many thanks
> >
> > James.
> >
> > Example error messages:
> >
> > Apr 20 04:53:04 oss5r-mgmt kernel: LustreError: dumping log to /tmp/ 
> > lustre-log.1303271584.3935
> > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow quota  
> > init 286s due to heavy IO load
> > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal  
> > start 39s due to heavy IO load
> > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 39 previous  
> > similar messages
> > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow  
> > brw_start 39s due to heavy IO load
> > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 38 previous  
> > similar messages
> > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal  
> > start 133s due to heavy IO load
> > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 44 previous  
> > similar messages
> > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow  
> > brw_start 133s due to heavy IO load
> > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 44 previous  
> > similar messages
> > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal  
> > start 236s due to heavy IO load
> > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow i_mutex  
> > 40s due to heavy IO load
> > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 2 previous  
> > similar messages
> > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 6 previous  
> > similar messages
> > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow i_mutex  
> > 277s due to heavy IO load
> > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow  
> > direct_io 286s due to heavy IO load
> > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 3 previous  
> > similar messages
> > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal  
> > start 285s due to heavy IO load
> > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 1 previous  
> > similar message
> > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow  
> > commitrw commit 285s due to heavy IO load
> > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 1 previous  
> > similar message
> > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow parent  
> > lock 236s due to heavy IO load
> >
> >
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss





More information about the lustre-discuss mailing list