[Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?

Colin Faber colin_faber at xyratex.com
Thu Dec 6 12:01:09 PST 2012


Hi Grigory,

The active-active failover configuration should make no difference here 
unless you're running block level replication between hosts (outside the 
scope of lustre).

What tuning do you currently have in place? Also, what kind of client 
work load are you experiencing (large / small file io)?

-cf


On 12/06/2012 12:45 PM, Grigory Shamov wrote:
> Dear Colin,
>
> Thanks for the reply!
>
> We have reduced the number of OST threads earlier, from the original DDN setting of 256 to 160. Looks like it made things better, but still, the problem persists. Reducing number of OST threads to a number that is lesser than number of clients seems to cause problems too..
>
> Also, do you know if having OSS servers in active-active failover configuration affects the Lustre performance? Could it be that it forces sync on all I/O, or something of this sort to happen?
>
>
>
> --
> Grigory Shamov
>
>
> --- On Thu, 12/6/12, Colin Faber <colin_faber at xyratex.com> wrote:
>
>> From: Colin Faber <colin_faber at xyratex.com>
>> Subject: Re: [Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?
>> To: "Grigory Shamov" <gas5x at yahoo.com>
>> Cc: lustre-discuss at lists.lustre.org
>> Date: Thursday, December 6, 2012, 11:28 AM
>> Hi,
>>
>> The messages indicate overloaded backend storage. You could
>> try this,
>> another option may be to statically set the maximum number
>> of threads on
>> the OSS, this should reduce load to the system and push the
>> backlogs to
>> your clients (hopefully)
>>
>> -cf
>>
>>
>> On 12/06/2012 12:06 PM, Grigory Shamov wrote:
>>> Hi,
>>>
>>> On our cluster, when there is a load on Lustre FS, at
>> some points it slows down precipitously, and there are very
>> very many "slow IO " and "slow setattr" messages on the OSS
>> servers:
>>> =======
>>> [2988758.408968] Lustre: scratch-OST0004: slow i_mutex
>> 51s due to heavy IO load
>>> [2988758.408974] Lustre: Skipped 276 previous similar
>> messages
>>> [2988760.309388] Lustre: scratch-OST0004: slow setattr
>> 50s due to heavy IO load
>>> [2988822.617865] Lustre: scratch-OST0004: slow setattr
>> 62s due to heavy IO load
>>> [2988822.689819] Lustre: scratch-OST0004: slow journal
>> start 48s due to heavy IO load
>>> [2988822.690627] Lustre: scratch-OST0004: slow journal
>> start 56s due to heavy IO load
>>> [2988823.125410] Lustre: scratch-OST0004: slow parent
>> lock 55s due to heavy IO load
>>> [2988823.125419] Lustre: Skipped 1 previous similar
>> message
>>> [2988823.125432] Lustre: scratch-OST0004: slow
>> preprw_write setup 55s due to heavy IO load
>>> [2988856.236914] Lustre: scratch-OST0004: slow
>> direct_io 33s due to heavy IO load
>>> [2988856.236922] Lustre: Skipped 323 previous similar
>> messages
>>> [2988892.543942] Lustre: scratch-OST0004: slow i_mutex
>> 48s due to heavy IO load
>>> [2988892.543950] Lustre: Skipped 280 previous similar
>> messages
>>> [2988892.545310] Lustre: scratch-OST0004: slow setattr
>> 55s due to heavy IO load
>>> [2988892.547328] Lustre: scratch-OST0004: slow parent
>> lock 42s due to heavy IO load
>>> [2988892.547334] Lustre: Skipped 4 previous similar
>> messages
>>> [2988958.306720] Lustre: scratch-OST0004: slow setattr
>> 52s due to heavy IO load
>>> [2988958.306724] Lustre: Skipped 1 previous similar
>> message
>>> [2988958.310818] Lustre: scratch-OST0004: slow parent
>> lock 59s due to heavy IO load
>>> [2989040.406738] Lustre: scratch-OST0004: slow setattr
>> 50s due to heavy IO load
>>> =========
>>>
>>> I wonder if mounting it on clients with "noatime"
>> and/or changing the atime_diff would help to rid off of
>> these Lustre slowdowns? Right now we have:
>> /proc/fs/lustre/mds/scratch-MDT0000/atime_diff on our MDS
>> server is 60.
>>> I've tried to Google it first, and found that
>> apparently "noatime " is not supported for 1.8, and changing
>> atime_diff is the preferred way?
>>> Could you please advise me, which way is
>> better/possible, and how does one change atime_diff?
>> Will it help? Does it require, say, client's remount, etc.?
>>> Any ideas and advice would be greatly appreciated!
>> Thank you very much in advance.
>>>
>>> --
>>> Grigory Shamov
>>> HPC Analyst, Westgrid/Compute Canada
>>> E2-588 EITC Building, University of Manitoba
>>> (204) 474-9625
>>>
>>>
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>




More information about the lustre-discuss mailing list