[Lustre-discuss] fseeks on lustre
Ronald K Long
rklong at usgs.gov
Wed Apr 14 11:08:30 PDT 2010
We've narrowed down the problem quite a bit.
The problematic code snippet is not actually doing any reads or writes;
it's just doing a massive number of fseek() operations within a couple
of nested loops. (Note: The production code is doing some I/O, but this
snippet was narrowed down to the bare minimum example that exhibited the
problem, which was how we discovered that fseek was the culprit.)
The issue appears to be the behavior of the glibc implementation of
fseek(). Apparently, a call to fseek() on a buffered file stream causes
glibc to flush the stream (regardless of whether a flush is actually
needed). If we modify the snippet to call setvbuf() and disable
buffering on the file stream before any of the fseek() calls, then it
finishes more or less instantly, as you would expect.
The problem is that this offending code is actually buried deep within a
COTS library that we're using to do image processing (the Hierarchical
Data Format (HDF) library). While we do have access to the source code
for this library and could conceivably modify it, this is a large and
complex library, and a change of this nature would require us to do a
large amount of regression testing to ensure that nothing was broken.
So at the end of the day this is really not a "Lustre problem" per se,
though we would still be interested in any suggestions as to how we can
minimize the effects of this glibc "flush penalty". This penalty is not
particularly onerous when reading and writing to local disk, but is
obviously more of an issue with a distributed filesystem.
Thank you again for the support.
Rocky
On Wed, 2010-04-14 at 07:08 -0500, Ronald K Long wrote:
>
> Andreas - Here is a snipet of the strace output.
>
> read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
> \0\0"..., 2097152) = 2097152
As Andreas suspected, your application is doing 2MB reads every time.
Does it really need 2MB of data on each read? If not, can you fix your
application to only read as much data as it actually wants?
b.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100414/3a773e8a/attachment.htm>
More information about the lustre-discuss
mailing list