[Lustre-discuss] fseeks on lustre

Ronald K Long rklong at usgs.gov
Wed Apr 14 11:08:30 PDT 2010


We've narrowed down the problem quite a bit.

The problematic code snippet is not actually doing any reads or writes; 
it's just doing a massive number of fseek() operations within a couple 
of nested loops.  (Note: The production code is doing some I/O, but this 
snippet was narrowed down to the bare minimum example that exhibited the 
problem, which was how we discovered that fseek was the culprit.)

The issue appears to be the behavior of the glibc implementation of 
fseek().  Apparently, a call to fseek() on a buffered file stream causes 
glibc to flush the stream (regardless of whether a flush is actually 
needed).  If we modify the snippet to call setvbuf() and disable 
buffering on the file stream before any of the fseek() calls, then it 
finishes more or less instantly, as you would expect.

The problem is that this offending code is actually buried deep within a 
COTS library that we're using to do image processing (the Hierarchical 
Data Format (HDF) library).  While we do have access to the source code 
for this library and could conceivably modify it, this is a large and 
complex library, and a change of this nature would require us to do a 
large amount of regression testing to ensure that nothing was broken.

So at the end of the day this is really not a "Lustre problem" per se, 
though we would still be interested in any suggestions as to how we can 
minimize the effects of this glibc "flush penalty".  This penalty is not 
particularly onerous when reading and writing to local disk, but is 
obviously more of an issue with a distributed filesystem.

Thank you again for the support.

Rocky 




On Wed, 2010-04-14 at 07:08 -0500, Ronald K Long wrote:
> 
> Andreas - Here is a snipet of the strace output. 
> 
> read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
> \0\0"..., 2097152) = 2097152 

As Andreas suspected, your application is doing 2MB reads every time.
Does it really need 2MB of data on each read?  If not, can you fix your
application to only read as much data as it actually wants?

b.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100414/3a773e8a/attachment.htm>


More information about the lustre-discuss mailing list