[Lustre-discuss] [PATCH] Avoid Lustre failure on temporary failure

Alexey Lyashkov alexey_lyashkov at xyratex.com
Tue Sep 2 03:09:47 PDT 2014


credits for Lustre ? it’s works? now it’s strange number without relation to real network structure and produce over buffering issues on server side.
 
On Sep 2, 2014, at 12:22 PM, Zhen, Liang <liang.zhen at intel.com> wrote:

> Yes, I think this is the potential issue of this patch, for each 1M data lustre has 256 fragments (256 pages) on 4K pagesize system, which means we can have max to (credits X 256) outstanding work requests for each connection, decreasing max_send_wr may hit ib_post_send() failure under heavy workload.
> 
> I understand this may be a problem for low level stack to allocate big chunk of space, and cause memory allocating failures. The solution is enabling map_on_demand and use FMR, however, enabling this on some nodes will prevent them to join cluster if other nodes have no map_on_demand, we already have a patch for this which is pending on review, please check this (LU-3322)
> 
> Thanks
> Liang
> 
> From: David McMillen <mcmillen at cray.com<mailto:mcmillen at cray.com>>
> Date: Sunday, August 31, 2014 at 6:48 PM
> To: "lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>" <lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>>, Eli Cohen <eli at dev.mellanox.co.il<mailto:eli at dev.mellanox.co.il>>
> Subject: Re: [Lustre-discuss] [PATCH] Avoid Lustre failure on temporary failure
> 
> Has this been tested with a significant I/O load?  We had tried a similar approach but ran into subsequent errors and connection drops when the ib_post_send() failed.  The code assumes that the original init_qp_attr->cap.max_send_wr value succeeded.  Is there a second part to this patch?
> 
> Dave
> 
> On Sun, Aug 31, 2014 at 2:53 AM, Eli Cohen <eli at dev.mellanox.co.il<mailto:eli at dev.mellanox.co.il>> wrote:
> 
>> Lustre code tries to create a QP with max_send_wr which depends on a module
>> parameter.  The device capabilities do provide the maximum number of send work
>> requests that the device supports but the actual number of work requests that
>> can be supported in a specific case depends on other characteristics of the
>> work queue, the transport type, etc. This is in compliance with the IB spec:
>> 
>> 11.2.1.2 QUERY HCA
>> Description:
>> Returns the attributes for the specified HCA.
>> The maximum values defined in this section are guaranteed
>> not-to-exceed values. It is possible for an implementation to allocate
>> some HCA resources from the same space. In that case, the maximum
>> values returned are not guaranteed for all of those resources
>> simultaneously.
>> 
>> This patch tries to decrease the number of requested work requests to a level
>> that can be supported by the HCA. This prevents unnecessary failures.
>> 
>> Signed-off-by: Eli Cohen <eli at mellanox.com>
>> ---
>> lnet/klnds/o2iblnd/o2iblnd.c | 25 ++++++++++++++++++-------
>> 1 file changed, 18 insertions(+), 7 deletions(-)
>> 
>> diff --git a/lnet/klnds/o2iblnd/o2iblnd.c b/lnet/klnds/o2iblnd/o2iblnd.c
>> index 4061db00cba2..ef1c6e07cb45 100644
>> --- a/lnet/klnds/o2iblnd/o2iblnd.c
>> +++ b/lnet/klnds/o2iblnd/o2iblnd.c
>> @@ -736,6 +736,7 @@ kiblnd_create_conn(kib_peer_t *peer, struct rdma_cm_id *cmid,
>>      int                     cpt;
>>      int                     rc;
>>      int                     i;
>> +     int                     orig_wr;
>> 
>>      LASSERT(net != NULL);
>>      LASSERT(!in_interrupt());
>> @@ -862,13 +863,23 @@ kiblnd_create_conn(kib_peer_t *peer, struct rdma_cm_id *cmid,
>> 
>>      conn->ibc_sched = sched;
>> 
>> -        rc = rdma_create_qp(cmid, conn->ibc_hdev->ibh_pd, init_qp_attr);
>> -        if (rc != 0) {
>> -                CERROR("Can't create QP: %d, send_wr: %d, recv_wr: %d\n",
>> -                       rc, init_qp_attr->cap.max_send_wr,
>> -                       init_qp_attr->cap.max_recv_wr);
>> -                goto failed_2;
>> -        }
>> +     orig_wr = init_qp_attr->cap.max_send_wr;
>> +     do {
>> +             rc = rdma_create_qp(cmid, conn->ibc_hdev->ibh_pd, init_qp_attr);
>> +             if (!rc || init_qp_attr->cap.max_send_wr < 16)
>> +                     break;
>> +
>> +             init_qp_attr->cap.max_send_wr /= 2;
>> +     } while (rc);
>> +     if (rc != 0) {
>> +             CERROR("Can't create QP: %d, send_wr: %d, recv_wr: %d\n",
>> +                    rc, init_qp_attr->cap.max_send_wr,
>> +                    init_qp_attr->cap.max_recv_wr);
>> +             goto failed_2;
>> +     }
>> +     if (orig_wr != init_qp_attr->cap.max_send_wr)
>> +             pr_info("original send wr %d, created with %d\n",
>> +                     orig_wr, init_qp_attr->cap.max_send_wr);
>> 
>>         LIBCFS_FREE(init_qp_attr, sizeof(*init_qp_attr));
>> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss




More information about the lustre-discuss mailing list