[Lustre-discuss] short writes

Kevin Van Maren kevin.van.maren at oracle.com
Thu Jul 8 06:53:54 PDT 2010


Hi David,

I've also seen short writes on local file systems -- can't even count 
the number of times I've modified codes to use wrappers that handle 
short reads/writes.  Not at all surprised you see them when suspending 
the app.

http://www.opengroup.org/onlinepubs/000095399/functions/write.html
"If write() is interrupted by a signal after it successfully writes some 
data, it shall return the number of bytes written."
Similar language exists for read as well.  I always thought libc should 
handle the retry for you by default, but I didn't write the spec.

Signals are relatively rare, and the window is a bit smaller for a local 
file system, which may be why they haven't seen it/properly dealt with 
it yet.

Kevin


David Singleton wrote:
> The POSIX standard pretty clearly allows short writes to occur (number of
> bytes written less than requested in a successful call to write) but its
> not something you see very often and I dont think many users/applications
> expect it to occur when writing to disk based files.  We are seeing it
> fairly regularly and just wanted to confirm that we (rather our users)
> should expect this behaviour from Lustre.
>
> We are seeing the issue with the infamous Gaussian quantum chem code
> which spends literally days constantly writing and reading to scratch files
> in roughly 1GB chunks as part of out-of-core solvers.  We manage jobs using
> simple SIGSTOP/SIGCONT based suspend/resume and occasionally jobs will flag
> a short write immediately after a SIGCONT. The application incorrectly
> treats this as an error and aborts.  Adding code to complete the write
> appears to fix the problem (as you'd hope).  Now we are at the stage of
> "debating" with the application developers whether it's their problem or
> Lustre's.
>
> Is this considered normal Lustre behaviour?
>
> This is with 1.8.3 clients on 2.6.27.46.
>
> Thanks,
> David
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   




More information about the lustre-discuss mailing list