[Lustre-discuss] [ROMIO Req #940] a new Lustre ADIO driver]
pascal.deveze at bull.net
pascal.deveze at bull.net
Fri Jun 12 08:35:17 PDT 2009
Rob,
I've made 4 modifications on the new Lustre ADIO driver. After that, all
the romio tests succeed.
The application "coll_test" also passes on 2 nodes (4 processes - 2OST)
with the dimension of the 3 d array varying from 1 to 300.
The version I use comes from mpich2-1.1.
Hereafter is a quick explanation on my four modifications:
1) The calculation of avail_cb_nodes in ADIOI_LUSTRE_Get_stripping_info()
For example, when nprocs_for_coll=2, stripe_count=2 and CO=1 ,
avail_cb_nodes is set to 1.
The value 2 should be used instead.
I propose to change the lines by:
/* avail_cb_nodes should divide stripe_count exactly */
while (stripe_count % avail_cb_nodes) {
avail_cb_nodes--;
}
2) The parameter len_list_ptr has been modified in include/adioi.h, so I
propose to change :
int **len_list_ptr;
to
ADIO_Offset **len_list_ptr;
in ad_lustre_aggregate.c and ad_lustre_wrcoll.c
3) Use of buf_idx[ ] in ADIOI_LUSTRE_W_Exchange_data
I had a lot of troubles with the table of pointers buf_idx[ ]:
"coll_test" with different dimensions of the 3d array detected a lot
of errors.
I solved the problems by using only one pointer: buf_idx[0] initialized
by 0 in
"ADIOI_LUSTRE_Calc_my_req" and used/modified in
"ADIOI_LUSTRE_W_Exchange_data":
if (send_size[i]) {
MPI_Isend(((char *) buf) + buf_idx[0], send_size[i],
MPI_BYTE, i, myrank + i + 100 * iter, fd->comm,
send_req + j);
j++;
buf_idx[0]+=send_size[i];
}
Of course, a single variable may be used to do that, but this implies
more changes.
This solution works fine with coll_test (used with a wide range of
dimensions of the 3d array, on 4 nodes and 2 OST). Maybe this solution
is too simple,
and I missed something. What is your opinion ?
4) Macro ADIOI_BUFFERED_WRITE_WITHOUT_READ in ad_lustre_wrstr.c (line 213)
This macro causes an abort in the tests "i_concontig" and "noncontig".
In this macro, ADIO_WriteContig was called a lot of time with
writebuf_len egal to 0.
What is wanted is to copy the data (by memcpy) in the writebuf as much
as possible
to fill a stripe and then to call ADIO_WriteContig. As far as I
understand it, this was not
the case.
I replaced this macro by a call to
memcpy(writebuf + req_off - writebuf_off, (char *)buf + user_off,
req_len);
This modification works only if it is possible to copy all the user data
in the write buffer (writebuf). (i.e. if bufsize <= writebuf_len)
If bufsize is higher, a loop has to be introduced.
What do you think about this change ?
You will find attached my modification vs mpich2-1.1. Your questions and
comments about them are welcome.
Regards,
Pascal DEVEZE
R&D HPC
Bull France
(See attached file: Modifications-vs-1.1)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Modifications-vs-1.1
Type: application/octet-stream
Size: 3813 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090612/8c35b9e7/attachment.obj>
More information about the lustre-discuss
mailing list