[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

Dilger, Andreas andreas.dilger at intel.com
Thu Dec 6 11:41:38 PST 2012


On 12/6/12 12:06 PM, "Grigory Shamov" <gas5x at yahoo.com> wrote:

>Hi,
>
>On our cluster, when there is a load on Lustre FS, at some points it
>slows down precipitously, and there are very very many "slow IO " and
>"slow setattr" messages on the OSS servers:
>
>=======
>[2988758.408968] Lustre: scratch-OST0004: slow i_mutex 51s due to heavy
>IO load
>[2988758.408974] Lustre: Skipped 276 previous similar messages
>[2988760.309388] Lustre: scratch-OST0004: slow setattr 50s due to heavy
>IO load
>[2988822.617865] Lustre: scratch-OST0004: slow setattr 62s due to heavy
>IO load
>[2988822.689819] Lustre: scratch-OST0004: slow journal start 48s due to
>heavy IO load
>[2988822.690627] Lustre: scratch-OST0004: slow journal start 56s due to
>heavy IO load
>[2988823.125410] Lustre: scratch-OST0004: slow parent lock 55s due to
>heavy IO load
>[2988823.125419] Lustre: Skipped 1 previous similar message
>[2988823.125432] Lustre: scratch-OST0004: slow preprw_write setup 55s due
>to heavy IO load
>[2988856.236914] Lustre: scratch-OST0004: slow direct_io 33s due to heavy
>IO load
>[2988856.236922] Lustre: Skipped 323 previous similar messages
>[2988892.543942] Lustre: scratch-OST0004: slow i_mutex 48s due to heavy
>IO load
>[2988892.543950] Lustre: Skipped 280 previous similar messages
>[2988892.545310] Lustre: scratch-OST0004: slow setattr 55s due to heavy
>IO load
>[2988892.547328] Lustre: scratch-OST0004: slow parent lock 42s due to
>heavy IO load
>[2988892.547334] Lustre: Skipped 4 previous similar messages
>[2988958.306720] Lustre: scratch-OST0004: slow setattr 52s due to heavy
>IO load
>[2988958.306724] Lustre: Skipped 1 previous similar message
>[2988958.310818] Lustre: scratch-OST0004: slow parent lock 59s due to
>heavy IO load
>[2989040.406738] Lustre: scratch-OST0004: slow setattr 50s due to heavy
>IO load
>=========
>
>I wonder if mounting it on clients with "noatime" and/or changing the
>atime_diff would help to rid off of these Lustre slowdowns? Right now we
>have:  /proc/fs/lustre/mds/scratch-MDT0000/atime_diff on our MDS server
>is 60.

No atime updates are ever written to disk on the OSTs, and at most only
once every 10 minutes on the MDT.  This is very likely due to small IO
from the client or similar.  Check "lctl get_param obdfilter.*.brw_stats"
to see what kind of IO pattern the clients are sending.

>I've tried to Google it first, and found that apparently "noatime " is
>not supported for 1.8, and changing atime_diff is the preferred way?
>
>Could you please advise me, which way is better/possible, and how does
>one change atime_diff?  Will it help? Does it require, say, client's
>remount, etc.?
>
>Any ideas and advice would be greatly appreciated! Thank you very much in
>advance.
>
>
>--
>Grigory Shamov
>HPC Analyst, Westgrid/Compute Canada
>E2-588 EITC Building, University of Manitoba
>(204) 474-9625
>
>
>_______________________________________________
>Lustre-discuss mailing list
>Lustre-discuss at lists.lustre.org
>http://lists.lustre.org/mailman/listinfo/lustre-discuss
>





More information about the lustre-discuss mailing list