[Lustre-devel] Vector I/O api

Peter Braam Peter.Braam at Sun.COM
Sat Jul 12 13:23:08 PDT 2008


Hi -

1024 segments is fine.

Readv is the wrong call - it reads contiguous areas from files.

Readx/writex sound good, but making this available asap through our I/O
library is important.

It should be coded to somewhat minimize the number of round trips over the
network to get the I/O done.

So what are our options?


On 7/12/08 12:15 PM, "Tom.Wang" <Tom.Wang at Sun.COM> wrote:

> Hello,
> 
> Yes, I just check source, we could use sys_readv here.
> But there are a limit of 1024 IO segments for each call, maybe it
> should not be a problem here. Actually, llite already include such
> api (ll_file_readv/writev). Then it should be easy to implement this
> by our lib. Sorry for the previous confuse reply.
> 
> Thanks
> WangDi
> 
> Eric Barton wrote:
>> Wangdi,
>> 
>> There seems to be some momentum behind getting readx/writex
>> adopted as posix standard system calls.  That seems the right
>> API to exploit (or anticipate if it's not implemented yet).
>> 
>> Note that the memory and file descriptors are not required to
>> be isomorphic (i.e. file and memory fragments don't have to
>> correspond directly).
>> 
>> struct iovec {
>>         void   *iov_base; /* Starting address */
>>         size_t  iov_len;  /* Number of bytes */
>> };
>> 
>> struct xtvec {
>>         off_t   xtv_off; /* Starting file offset */
>>         size_t  xtv_len; /* Number of bytes */
>> };
>> 
>> ssize_t readx(int fd, const struct iovec *iov, size_t iov_count,
>>               struct xtvec *xtv, size_t xtv_count);
>> 
>> ssize_t writex(int fd, const struct iovec *iov, size_t iov_count,
>>                struct xtvec *xtv, size_t xtv_count);
>> 
>>     Cheers,
>>               Eric
>> 
>> 
>>   
>>> -----Original Message-----
>>> From: lustre-devel-bounces at lists.lustre.org
>>> [mailto:lustre-devel-bounces at lists.lustre.org] On Behalf Of Tom.Wang
>>> Sent: 12 July 2008 4:38 PM
>>> To: Peter Braam
>>> Cc: lustre-devel
>>> Subject: Re: [Lustre-devel] Vector I/O api
>>> 
>>> 
>>> Peter Braam wrote:
>>>     
>>>> Tom -
>>>> 
>>>> In a recent call with CERN the request came up to construct a call
>>>> that can in parallel transfer an array of extents in a single file to
>>>> a list of buffers and vice-versa.
>>>> This call should be executed with read-ahead disabled, it will usually
>>>> be made when the user is well informed of the I/O that is about to
>>>> take place.
>>>> Is this easy to get into the Lustre client (using our I/O library)?
>>>>  Do you have this already for MPI/IO use?
>>>> 
>>>> Thanks.
>>>> 
>>>> Peter
>>>>       
>>> Hello, Peter
>>> 
>>> If you mean provide this list buffer read/write API in MPI by our
>>> library, it is easy.
>>> Because MPI already provide such API, you can define proper
>>> discontingous buf_type
>>> and file_type of these extents, and use (MPI_File_Write/read_all) to
>>> read/write these
>>> buffers in one call . We only need disable read-ahead here. So it should
>>> be easy to
>>> get into our I/O library.
>>> 
>>> But if you mean provide such API in llite, I am not sure it is easy.
>>> because it seems we
>>> could only use ioctl to implement such non-posix API IMHO, which always
>>> has page-size
>>> limit for transferring buffers here? It is probably I misunderstand
>>> something here.
>>> 
>>> Thanks
>>> WangDi
>>>     
>>> This kind of list buffers transferring can be implemented with proper
>>> MPI file_view
>>>     
>>>> ------------------------------------------------------------------------
>>>> 
>>>> _______________________________________________
>>>> Lustre-devel mailing list
>>>> Lustre-devel at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>>   
>>>>       
>>> -- 
>>> Regards,
>>> Tom Wangdi    
>>> --
>>> Sun Lustre Group
>>> System Software Engineer
>>> http://www.sun.com
>>> 
>>> _______________________________________________
>>> Lustre-devel mailing list
>>> Lustre-devel at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>> 
>>>     
>> 
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>   
> 





More information about the lustre-devel mailing list