[Lustre-discuss] poor ost write performance.

Kevin Van Maren Kevin.Van.Maren at oracle.com
Wed Apr 20 04:48:51 PDT 2011


First guess is the increased memory pressure caused by the Lustre 1.8  
read cache.  Many times "slow" messages are caused by memory  
allocatons taking a long time.

You could try disabling the read cache and see if that clears up the  
slow messages.

Kevin


On Apr 20, 2011, at 4:29 AM, James Rose <James.Rose at framestore.com>  
wrote:

> Hi
>
> We have been experiencing degraded performance for a few days on a  
> fresh install of lustre 1.8.5 (on RHEL5 using sun ext4 rpms).  The  
> initial bulk load of the data will be fine but once in use for a  
> while writes become very slow to individual ost.  This will block io  
> for a few minutes and then carry on as normal.  The slow writes will  
> then move to another ost.  This can be seen in iostat and many slow  
> IO messages will be seen in the logs (example included)
>
> The osts are between 87 90 % full.  Not ideal but has not caused any  
> issues running 1.6.7.2 on the same hardware.
>
> The osts are RAID6 on external raid chassis (Infortrend).  Each ost  
> is 5.4T (small).  The server is Dual AMD (4 cores). 16G Ram. Qlogic  
> FC HBA.
>
> I mounted the osts as ldiskfs and tried a few write tests.  These  
> also show the same behaviour.
>
> While the write operation is blocked there will be hundreds of read  
> tps and a very small kb/s read from the raid but now writes.  As  
> soon as this completes writes will go through at a more expected  
> speed.
>
> Any idea what is going on?
>
> Many thanks
>
> James.
>
> Example error messages:
>
> Apr 20 04:53:04 oss5r-mgmt kernel: LustreError: dumping log to /tmp/ 
> lustre-log.1303271584.3935
> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow quota  
> init 286s due to heavy IO load
> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal  
> start 39s due to heavy IO load
> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 39 previous  
> similar messages
> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow  
> brw_start 39s due to heavy IO load
> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 38 previous  
> similar messages
> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal  
> start 133s due to heavy IO load
> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 44 previous  
> similar messages
> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow  
> brw_start 133s due to heavy IO load
> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 44 previous  
> similar messages
> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal  
> start 236s due to heavy IO load
> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow i_mutex  
> 40s due to heavy IO load
> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 2 previous  
> similar messages
> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 6 previous  
> similar messages
> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow i_mutex  
> 277s due to heavy IO load
> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow  
> direct_io 286s due to heavy IO load
> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 3 previous  
> similar messages
> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal  
> start 285s due to heavy IO load
> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 1 previous  
> similar message
> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow  
> commitrw commit 285s due to heavy IO load
> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 1 previous  
> similar message
> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow parent  
> lock 236s due to heavy IO load
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss



More information about the lustre-discuss mailing list