[Lustre-discuss] [ROMIO Req #940] a new Lustre ADIO driver]

pascal.deveze at bull.net pascal.deveze at bull.net
Fri Jun 12 08:35:17 PDT 2009


Rob,

I've made 4 modifications on the new Lustre ADIO driver. After that, all
the romio tests succeed.

The application "coll_test" also passes on 2 nodes (4 processes - 2OST)
with the dimension of  the 3 d array varying from 1 to 300.

The version I use comes from mpich2-1.1.

Hereafter is a quick explanation on my four modifications:

1) The calculation of avail_cb_nodes in ADIOI_LUSTRE_Get_stripping_info()
    For example, when nprocs_for_coll=2, stripe_count=2 and CO=1 ,
avail_cb_nodes is set to 1.
    The value 2 should be used instead.

   I propose to change the lines by:

     /* avail_cb_nodes should divide stripe_count exactly */
           while (stripe_count % avail_cb_nodes) {
                     avail_cb_nodes--;
           }


2) The parameter len_list_ptr has been modified in include/adioi.h, so I
propose to change :
               int **len_list_ptr;
      to
               ADIO_Offset **len_list_ptr;

in ad_lustre_aggregate.c and ad_lustre_wrcoll.c


3) Use of buf_idx[ ] in ADIOI_LUSTRE_W_Exchange_data

      I had a lot of troubles with the table of pointers buf_idx[ ]:
"coll_test" with different dimensions of the 3d array detected a lot
      of errors.

  I solved the problems by using only one pointer: buf_idx[0] initialized
by 0 in
  "ADIOI_LUSTRE_Calc_my_req" and used/modified in
"ADIOI_LUSTRE_W_Exchange_data":
            if (send_size[i]) {
                MPI_Isend(((char *) buf) + buf_idx[0], send_size[i],
                          MPI_BYTE, i, myrank + i + 100 * iter, fd->comm,
                          send_req + j);
                j++;
                buf_idx[0]+=send_size[i];
                }
   Of course, a single variable may be used to do that, but this implies
   more changes.
   This solution works fine with coll_test (used with a wide range of
   dimensions of the 3d array, on 4 nodes and 2 OST). Maybe this solution
is too simple,
   and I missed something. What is your opinion ?


4) Macro ADIOI_BUFFERED_WRITE_WITHOUT_READ in ad_lustre_wrstr.c (line 213)
   This macro causes an abort in the tests "i_concontig" and "noncontig".

   In this macro, ADIO_WriteContig was called a lot of time with
writebuf_len egal to 0.
   What is wanted is to copy the data (by memcpy) in the writebuf as much
as possible
   to fill a stripe and then to call ADIO_WriteContig. As far as I
understand it, this was not
   the case.

   I replaced this macro by a call to

       memcpy(writebuf + req_off - writebuf_off, (char *)buf + user_off,
req_len);

   This modification works only if it is possible to copy all the user data
   in the write buffer (writebuf). (i.e. if bufsize <= writebuf_len)
   If bufsize is higher, a loop has to be introduced.

   What do you think about this change ?



You will find attached my modification vs mpich2-1.1. Your questions and
comments about them are welcome.

Regards,


Pascal DEVEZE
R&D HPC
Bull France


(See attached file: Modifications-vs-1.1)




-------------- next part --------------
A non-text attachment was scrubbed...
Name: Modifications-vs-1.1
Type: application/octet-stream
Size: 3813 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090612/8c35b9e7/attachment.obj>


More information about the lustre-discuss mailing list