[lustre-discuss] ldlm_enqueue and ldlm_cli_enqueue errors

Murshid Azman murshid.azman at gmail.com
Tue Sep 27 21:43:53 PDT 2016


Hi,

-2 is ENOENT The requested file or directory does not exist
-11 is EAGAIN Try again
-95 is EOPNOTSUPP Operation not supported on transport endpoint
-116 is ESTALE Stale file handle

We've had this issue when we were running 2.5.3. Didn't seem to be giving
us any problems so we've updated lustre/ptlrpc/client.c to something like
this to suppress the error thrown (my syntax could be wrong):

        if ((rc < 0) && !(rc == -2)) {
                CERROR("ldlm_cli_enqueue: %d\n", rc);
                mdc_clear_replay_flag(req, rc);
                ptlrpc_req_finished(req);
                RETURN(rc);
        }

No longer an issue after switching to master.

Hope this helps.

Murshid.


On Wed, Sep 28, 2016 at 4:26 AM, K. Scott Rowe <krowe at nrao.edu> wrote:

>
> We migrated from an MGS/MDS and OSSes running lustre-1.8.5 to a
> completely new MGS/MDS and OSSes running lustre-2.4.3 on Sep. 24,
> 2016. We use a mix of lustre-1.8.9 and lustre-2.4.3 clients, both of
> which mount lustre with the following options
> "defaults,noauto,user_xattr,flock".  Since the migration, we have seen
> various ldlm_enqueue and ldlm_cli_enqueue errors like the following...
>
> These are from our lustre-2.4.3 clients connected with InfiniBand
>
>   Sep 26 06:37:48 nmpost047 kernel: LustreError: 11-0:
> aoclst03-MDT0000-mdc-ffff88101f7c7000: Communicating with
> 192.168.1.30 at o2ib, operation ldlm_enqueue failed with -116.
>   Sep 26 06:37:48 nmpost047 kernel: LustreError:
> 38632:0:(mdc_locks.c:848:mdc_enqueue()) ldlm_cli_enqueue: -116
>
>   Sep 26 08:46:58 nmpost060 kernel: LustreError: 11-0:
> aoclst03-MDT0000-mdc-ffff8810622d5c00: Communicating with
> 192.168.1.30 at o2ib, operation ldlm_enqueue failed with -95.
>   Sep 26 08:46:58 nmpost060 kernel: LustreError:
> 124585:0:(mdc_locks.c:848:mdc_enqueue()) ldlm_cli_enqueue: -95
>
>   Sep 26 09:42:01 nmpost036 kernel: LustreError:
> 21189:0:(mdc_locks.c:848:mdc_enqueue()) ldlm_cli_enqueue: -2
>
>   Sep 26 20:37:08 nmpost017 kernel: LustreError: 11-0:
> aoclst03-MDT0000-mdc-ffff880804845000: Communicating with
> 192.168.1.30 at o2ib, operation ldlm_enqueue failed with -11.
>
>
> These are from our lustre-1.8.9 clients connected with 1Gb and LNET
> routers
>
>   Sep 26 12:57:45 tofino kernel: LustreError: 11-0: an error occurred
> while communicating with 192.168.1.30 at o2ib. The ldlm_enqueue operation
> failed with -11
>
>
> I saw a reference to some of these message is
> https://jira.hpdd.intel.com/browse/LU-4705 but it was not clear what
> the seriousness of the error are.  Can anyone tell me if these are
> errors we should worry about or are they more like warnings that
> should be ignored?  And if they should be ignored, is there a way to
> disable them?
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20160928/f0fa4ecb/attachment.htm>


More information about the lustre-discuss mailing list