[Lustre-discuss] short writes

John Hammond jhammond at ices.utexas.edu
Thu Jul 8 16:51:51 PDT 2010


On 07/08/2010 05:48 PM, Kevin Van Maren wrote:
> John Hammond wrote:
>> On 07/08/2010 08:53 AM, Kevin Van Maren wrote:
>>> Hi David,
>>>
>>> I've also seen short writes on local file systems -- can't even
>>> count the number of times I've modified codes to use wrappers
>>> that handle short reads/writes. Not at all surprised you see
>>> them when suspending the app.
>>>
>>> http://www.opengroup.org/onlinepubs/000095399/functions/write.html
>>>
>>>
>>>
"If write() is interrupted by a signal after it successfully writes some
>>> data, it shall return the number of bytes written." Similar
>>> language exists for read as well. I always thought libc should
>>> handle the retry for you by default, but I didn't write the
>>> spec.
>>>
>>> Signals are relatively rare, and the window is a bit smaller for
>>> a local file system, which may be why they haven't seen
>>> it/properly dealt with it yet.
>>
>> It also says "The issue of which files or file types are
>> interruptible is considered an implementation design issue. This
>> is often affected primarily by hardware and reliability issues."
>>
>> For Linux, the signal(7) manpage indicates that read(2), readv(2),
>> write(2), writev(2), and ioctl(2) calls on "slow" devices should
>> return -EINTR when interrupted by a signal, and goes on to say
>> that "slow" devices are ones "where the I/O call may block for an
>> indefinite time, for example, a terminal, pipe, or socket. (A disk
>> is not a slow device according to this definition.)"
>
> How about a network file system waiting for server failover
> (especially if it is not automatic)?

That's not indefinite.  The FS is waiting for something which will
eventually occur.  (Assuming it's is correctly administered).

>> Nowhere does it say something really helpfully clear like "Writing
>> to a regular file shall suspend the calling process until such
>> time as..." But, I interpret this to mean that operations on
>> regular files are not interruptible, and should not return -EINTR.
>> Moreover, I understand that this is the consensus among those
>> unlucky enough to care.
>>
>> On the other hand, there are some explicitly specified situations
>> which will result in short writes to a regular file, like file
>> size limits.
>
> With NFS, "hard,intr" is the most sane configuration.

Yes, because NFS is FTP but with less typing.

> For Lustre, operations (should) become interruptible after the
> initial timeout period has passed.

I disagree.  Lustre is not NFS.  The intended uses are big
noninteractive jobs.  Who would interrupt them?  Most likely
administrators who know that some server is hosed.  Better to have a
long timeout, and once it passes, the operations return -EIO, the
clients try to reconnect, and maybe the FS can heal itself.  Even if it
can't, I would argue that the FS is still easier to administer than
before, since you don't have to ssh out to every stuck node.

Also, why make the logic any harder?  In the FS, isn't it much easier to 
emulate a block device, leaving the process in D sleep when you have to? 
  And in the application it's one less thing to worry about.  Plus the 
behavior matches the block device/page cache mental-model better.

-- 
John L. Hammond, Ph.D.
ICES, The University of Texas at Austin
jhammond at ices.utexas.edu
(512) 471-9304



More information about the lustre-discuss mailing list