[Lustre-discuss] brief 'hangs' on file operations

Andreas Dilger andreas.dilger at oracle.com
Thu Sep 2 09:01:28 PDT 2010


On 2010-09-02, at 06:43, Tina Friedrich wrote:
> Causing most grieve at the moment is that we sometimes see delays 
> writing files. From the writing clients end, it simply looks as if I/O 
> stops for a while (we've seen 'pauses' of anything up to 10 seconds). 
> This appears to be independent of what client does the writing, and 
> software doing the writing. We investigated this a bit using strace and 
> dd; the 'slow' calls appear to always be either open, write, or close 
> calls. Usually, these take well below 0.001s; in around 0.5% or 1% of 
> cases, they take up to multiple seconds. It does not seem to be 
> associated with any specific OST, OSS, client or anything; there is 
> nothing in any log files or any exceptional load on MDS or OSS or any of 
> the clients.

This is most likely associated with delays in committing the journal on the MDT or OST, which can happen if the journal fills completely.  Having larger journals can help, if you have enough RAM to keep them all in memory and not overflow.  Alternately, if you make the journals small it will limit the latency, at the cost of reducing overall performance.  A third alternative might be to use SSDs for the journal devices.

> The other issue is that we frequently see delays when trying to read a 
> file. I sometimes takes more than 60s for a file to be visible on a 
> machine after the initial write on a different machine has completed 
> (both machines being Lustre clients). Again, there is nothing in the 
> logs, nor exceptional load on any of the machines.

This is probably just a manifestation of the first problem.  The issue likely isn't in the read, but a delay in flushing the data from the cache of the writing client.  There were fixes made in 1.8 to increase the IO priority for clients writing data under a lock that other clients are waiting on.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.




More information about the lustre-discuss mailing list