[Lustre-discuss] brief 'hangs' on file operations

Tina Friedrich Tina.Friedrich at diamond.ac.uk
Thu Sep 2 09:36:44 PDT 2010

Hi Bernd,

On 02/09/10 17:21, Bernd Schubert wrote:
> On Thursday, September 02, 2010, Andreas Dilger wrote:
>> On 2010-09-02, at 06:43, Tina Friedrich wrote:
>>> Causing most grieve at the moment is that we sometimes see delays
>>> writing files. From the writing clients end, it simply looks as if I/O
>>> stops for a while (we've seen 'pauses' of anything up to 10 seconds).
>>> This appears to be independent of what client does the writing, and
>>> software doing the writing. We investigated this a bit using strace and
>>> dd; the 'slow' calls appear to always be either open, write, or close
>>> calls. Usually, these take well below 0.001s; in around 0.5% or 1% of
>>> cases, they take up to multiple seconds. It does not seem to be
>>> associated with any specific OST, OSS, client or anything; there is
>>> nothing in any log files or any exceptional load on MDS or OSS or any of
>>> the clients.
>> This is most likely associated with delays in committing the journal on the
>> MDT or OST, which can happen if the journal fills completely.  Having
>> larger journals can help, if you have enough RAM to keep them all in
>> memory and not overflow.  Alternately, if you make the journals small it
>> will limit the latency, at the cost of reducing overall performance.  A
>> third alternative might be to use SSDs for the journal devices.
> As diamond uses DDN hardware, it should help in general with performance to
> update to 1.8 and to enable the async journal feature. I guess it also might
> help to reduce those delays, as writes are more optimized.

We have been considering an update; however, due to to this being a 
production file system (and an important one), that's not something that 
we can do easily.

> A question, though. Tina, do you use our ddn udev rules, which tune the
> devices for optimized performance? If not, please send a mail to
> support at ddn.com and ask for a recent udev rpm please (available for RHEL5 only
> so far, also *might* work on SLES11, but udev syntax changes to often, IMHO).
> And put [lustre] into the subject line please, as the lustre team maintains
> them.

Well; I don't think so; not 100% sure. There does not appear to be 
anything DDN specific in our udev rules (which makes me think that's a 
'no'). I have sent an email requesting them, and shall look into that, 
as well.


