[Lustre-discuss] poor ost write performance.

James Rose James.Rose at framestore.com
Wed Apr 20 00:29:17 PDT 2011


Hi 

We have been experiencing degraded performance for a few days on a fresh install of lustre 1.8.5 (on RHEL5 using sun ext4 rpms).  The initial bulk load of the data will be fine but once in use for a while writes become very slow to individual ost.  This will block io for a few minutes and then carry on as normal.  The slow writes will then move to another ost.  This can be seen in iostat and many slow IO messages will be seen in the logs (example included)

The osts are between 87 90 % full.  Not ideal but has not caused any issues running 1.6.7.2 on the same hardware.

The osts are RAID6 on external raid chassis (Infortrend).  Each ost is 5.4T (small).  The server is Dual AMD (4 cores). 16G Ram. Qlogic FC HBA. 

I mounted the osts as ldiskfs and tried a few write tests.  These also show the same behaviour. 

While the write operation is blocked there will be hundreds of read tps and a very small kb/s read from the raid but now writes.  As soon as this completes writes will go through at a more expected speed.

Any idea what is going on? 

Many thanks

James.

Example error messages:

Apr 20 04:53:04 oss5r-mgmt kernel: LustreError: dumping log to /tmp/lustre-log.1303271584.3935
Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow quota init 286s due to heavy IO load
Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal start 39s due to heavy IO load
Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 39 previous similar messages
Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow brw_start 39s due to heavy IO load
Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 38 previous similar messages
Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal start 133s due to heavy IO load
Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 44 previous similar messages
Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow brw_start 133s due to heavy IO load
Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 44 previous similar messages
Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal start 236s due to heavy IO load
Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow i_mutex 40s due to heavy IO load
Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 2 previous similar messages
Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 6 previous similar messages
Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow i_mutex 277s due to heavy IO load
Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow direct_io 286s due to heavy IO load
Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 3 previous similar messages
Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal start 285s due to heavy IO load
Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 1 previous similar message
Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow commitrw commit 285s due to heavy IO load
Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 1 previous similar message
Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow parent lock 236s due to heavy IO load





More information about the lustre-discuss mailing list