[Lustre-discuss] fseeks on lustre
Andreas Dilger
andreas.dilger at oracle.com
Wed Apr 14 12:06:38 PDT 2010
On 2010-04-14, at 11:08, Ronald K Long wrote:
> We've narrowed down the problem quite a bit.
>
> The problematic code snippet is not actually doing any reads or
> writes;
> it's just doing a massive number of fseek() operations within a couple
> of nested loops. (Note: The production code is doing some I/O, but
> this
> snippet was narrowed down to the bare minimum example that exhibited
> the
> problem, which was how we discovered that fseek was the culprit.)
>
> The issue appears to be the behavior of the glibc implementation of
> fseek(). Apparently, a call to fseek() on a buffered file stream
> causes
> glibc to flush the stream (regardless of whether a flush is actually
> needed). If we modify the snippet to call setvbuf() and disable
> buffering on the file stream before any of the fseek() calls, then it
> finishes more or less instantly, as you would expect.
I'd encourage you to file a bug (preferably with a patch) against
glibc to fix this. I've had reasonable success in getting problems
like this fixed upstream.
> The problem is that this offending code is actually buried deep
> within a
> COTS library that we're using to do image processing (the Hierarchical
> Data Format (HDF) library). While we do have access to the source
> code
> for this library and could conceivably modify it, this is a large and
> complex library, and a change of this nature would require us to do a
> large amount of regression testing to ensure that nothing was broken.
>
> So at the end of the day this is really not a "Lustre problem" per se,
> though we would still be interested in any suggestions as to how we
> can
> minimize the effects of this glibc "flush penalty". This penalty is
> not
> particularly onerous when reading and writing to local disk, but is
> obviously more of an issue with a distributed filesystem.
Similarly, HDF + Lustre usage is very common, and I expect that the
HDF developers would be interested to fix this if possible.
> On Wed, 2010-04-14 at 07:08 -0500, Ronald K Long wrote:
> >
> > Andreas - Here is a snipet of the strace output.
> >
> > read(3,
> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
> > \0\0"..., 2097152) = 2097152
>
> As Andreas suspected, your application is doing 2MB reads every time.
> Does it really need 2MB of data on each read? If not, can you fix
> your
> application to only read as much data as it actually wants?
Cheers, Andreas
--
Andreas Dilger
Principal Engineer, Lustre Group
Oracle Corporation Canada Inc.
More information about the lustre-discuss
mailing list