[Lustre-discuss] fseeks on lustre

Andreas Dilger andreas.dilger at oracle.com
Wed Apr 14 12:06:38 PDT 2010


On 2010-04-14, at 11:08, Ronald K Long wrote:
> We've narrowed down the problem quite a bit.
>
> The problematic code snippet is not actually doing any reads or  
> writes;
> it's just doing a massive number of fseek() operations within a couple
> of nested loops.  (Note: The production code is doing some I/O, but  
> this
> snippet was narrowed down to the bare minimum example that exhibited  
> the
> problem, which was how we discovered that fseek was the culprit.)
>
> The issue appears to be the behavior of the glibc implementation of
> fseek().  Apparently, a call to fseek() on a buffered file stream  
> causes
> glibc to flush the stream (regardless of whether a flush is actually
> needed).  If we modify the snippet to call setvbuf() and disable
> buffering on the file stream before any of the fseek() calls, then it
> finishes more or less instantly, as you would expect.

I'd encourage you to file a bug (preferably with a patch) against  
glibc to fix this.  I've had reasonable success in getting problems  
like this fixed upstream.

> The problem is that this offending code is actually buried deep  
> within a
> COTS library that we're using to do image processing (the Hierarchical
> Data Format (HDF) library).  While we do have access to the source  
> code
> for this library and could conceivably modify it, this is a large and
> complex library, and a change of this nature would require us to do a
> large amount of regression testing to ensure that nothing was broken.
>
> So at the end of the day this is really not a "Lustre problem" per se,
> though we would still be interested in any suggestions as to how we  
> can
> minimize the effects of this glibc "flush penalty".  This penalty is  
> not
> particularly onerous when reading and writing to local disk, but is
> obviously more of an issue with a distributed filesystem.

Similarly, HDF + Lustre usage is very common, and I expect that the  
HDF developers would be interested to fix this if possible.

> On Wed, 2010-04-14 at 07:08 -0500, Ronald K Long wrote:
> >
> > Andreas - Here is a snipet of the strace output.
> >
> > read(3,  
> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
> > \0\0"..., 2097152) = 2097152
>
> As Andreas suspected, your application is doing 2MB reads every time.
> Does it really need 2MB of data on each read?  If not, can you fix  
> your
> application to only read as much data as it actually wants?


Cheers, Andreas
--
Andreas Dilger
Principal Engineer, Lustre Group
Oracle Corporation Canada Inc.




More information about the lustre-discuss mailing list