[Lustre-discuss] fseeks on lustre

Fri Apr 16 10:26:21 PDT 2010

After doing some more digging it looks as though a bug was reported on 
this in 2007. 

https://bugzilla.lustre.org/show_bug.cgi?id=12739

We have loaded the patch for lustre attached to this bug, however when 
running the set_param command I am getting the following error. 

lctl set_param llite*.*.stat_blksize=4096
error: set_param: /proc/{fs,sys}/{lnet,lustre}/llite/lustre*/stat_blksize: 
No such process

Is this patch still valid for 2.6.9-78.0.22.EL_lustre.1.6.7.2smp

Thanks again

Rocky

From:
Andreas Dilger <andreas.dilger at oracle.com>
To:
Ronald K Long <rklong at usgs.gov>
Cc:
"Brian J. Murrell" <Brian.Murrell at Sun.COM>, 
lustre-discuss at lists.lustre.org, lustre-discuss-bounces at lists.lustre.org
Date:
04/14/2010 02:13 PM
Subject:
Re: [Lustre-discuss] fseeks on lustre

On 2010-04-14, at 11:08, Ronald K Long wrote:
> We've narrowed down the problem quite a bit.
>
> The problematic code snippet is not actually doing any reads or 
> writes;
> it's just doing a massive number of fseek() operations within a couple
> of nested loops.  (Note: The production code is doing some I/O, but 
> this
> snippet was narrowed down to the bare minimum example that exhibited 
> the
> problem, which was how we discovered that fseek was the culprit.)
>
> The issue appears to be the behavior of the glibc implementation of
> fseek().  Apparently, a call to fseek() on a buffered file stream 
> causes
> glibc to flush the stream (regardless of whether a flush is actually
> needed).  If we modify the snippet to call setvbuf() and disable
> buffering on the file stream before any of the fseek() calls, then it
> finishes more or less instantly, as you would expect.

I'd encourage you to file a bug (preferably with a patch) against 
glibc to fix this.  I've had reasonable success in getting problems 
like this fixed upstream.

> The problem is that this offending code is actually buried deep 
> within a
> COTS library that we're using to do image processing (the Hierarchical
> Data Format (HDF) library).  While we do have access to the source 
> code
> for this library and could conceivably modify it, this is a large and
> complex library, and a change of this nature would require us to do a
> large amount of regression testing to ensure that nothing was broken.
>
> So at the end of the day this is really not a "Lustre problem" per se,
> though we would still be interested in any suggestions as to how we 
> can
> minimize the effects of this glibc "flush penalty".  This penalty is 
> not
> particularly onerous when reading and writing to local disk, but is
> obviously more of an issue with a distributed filesystem.

Similarly, HDF + Lustre usage is very common, and I expect that the 
HDF developers would be interested to fix this if possible.

> On Wed, 2010-04-14 at 07:08 -0500, Ronald K Long wrote:
> >
> > Andreas - Here is a snipet of the strace output.
> >
> > read(3, 
> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
> > \0\0"..., 2097152) = 2097152
>
> As Andreas suspected, your application is doing 2MB reads every time.
> Does it really need 2MB of data on each read?  If not, can you fix 
> your
> application to only read as much data as it actually wants?

Cheers, Andreas
--
Andreas Dilger
Principal Engineer, Lustre Group
Oracle Corporation Canada Inc.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100416/ed1225a6/attachment.htm>