[lustre-discuss] server_bulk_callback errors until server reboots

Hebenstreit, Michael michael.hebenstreit at intel.com
Thu Jun 7 10:00:06 PDT 2018


Thanks – I do not have different type IB within one fabric. But with this info I found a few nodes that showed that error, but they are not matching the errors I see on the server.

Btw - I got the problem resolved on one FS after upgrading to Lustre 2.11

From: Raj [mailto:rajgautam at gmail.com]
Sent: Thursday, June 07, 2018 10:36 AM
To: Hebenstreit, Michael <michael.hebenstreit at intel.com>
Cc: White, Cliff <cliff.white at intel.com>; lustre-discuss <lustre-discuss at lists.lustre.org>
Subject: Re: [lustre-discuss] server_bulk_callback errors until server reboots

I seen the error when we had mix of FDR (using mlx4) and EDR(using mlx5) devices in lustre network. server_bulk_callback should have the corresponding client_bulk_callback in client.

http://wiki.lustre.org/Infiniband_Configuration_Howto
On Thu, Jun 7, 2018 at 11:24 AM Hebenstreit, Michael <michael.hebenstreit at intel.com<mailto:michael.hebenstreit at intel.com>> wrote:
No, clients do not show any issues.

-----Original Message-----
From: White, Cliff
Sent: Thursday, June 07, 2018 9:26 AM
To: Hebenstreit, Michael <michael.hebenstreit at intel.com<mailto:michael.hebenstreit at intel.com>>; lustre-discuss <lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>>
Subject: Re: [lustre-discuss] server_bulk_callback errors until server reboots


On 6/7/18, 7:00 AM, "lustre-discuss on behalf of Hebenstreit, Michael" <lustre-discuss-bounces at lists.lustre.org<mailto:lustre-discuss-bounces at lists.lustre.org> on behalf of michael.hebenstreit at intel.com<mailto:michael.hebenstreit at intel.com>> wrote:

    Hello

    I have now 2 Lustre systems that suddenly show this error - on a single OST the kernel log is filling with messages

    [58858.365663] LustreError: 123642:0:(events.c:447:server_bulk_callback()) event type 3, status -61, desc ffff880524f7e000
    [58865.328317] LustreError: 123640:0:(events.c:447:server_bulk_callback()) event type 5, status -61, desc ffff880cab4ec800
    [58865.340792] LustreError: 123641:0:(events.c:447:server_bulk_callback()) event type 5, status -61, desc ffff880524f7c600
    [58865.353167] LustreError: 123640:0:(events.c:447:server_bulk_callback()) event type 3, status -61, desc ffff880cab4ec800
    [58865.365503] LustreError: 123641:0:(events.c:447:server_bulk_callback()) event type 3, status -61, desc ffff880524f7c600

    until the server reboots. Clients are on 2.11/RH7.5, servers are on 2.7.19.10/RH7.4<http://2.7.19.10/RH7.4> . Has anyone experienced this before?

There should be some corresponding error messages on your clients, have you checked there?
cliffw

    Thanks
    Michael

    ------------------------------------------------------------------------
    Michael Hebenstreit                 Senior Cluster Architect
    Intel Corporation, MS: RR1-105/H14  Core and Visual Compute Group (DCE)
    4100 Sara Road<https://maps.google.com/?q=4100+Sara+Road&entry=gmail&source=g>                      Tel.:   +1 505-794-3144
    Rio Rancho, NM 87124
    UNITED STATES                       E-mail: michael.hebenstreit at intel.com<mailto:michael.hebenstreit at intel.com>



    _______________________________________________
    lustre-discuss mailing list
    lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
    http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180607/e24eb5d3/attachment-0001.html>


More information about the lustre-discuss mailing list