[lustre-discuss] soft lockups on lustre client 2.8.0 or 2.10.0

Deon Borman deon at blackginger.tv
Mon Oct 9 03:28:01 PDT 2017


Hi John,

You might be hitting LU-9230. There are some workarounds suggested in 
the comments on the jira page, that worked for us, after some tweaking.

Regards
Deon

On 07/10/2017 05:28, john casu wrote:
> with 2.8.0 or 2.10.0 client running on Centos 7.1511 (7.2), when I run 
> IOR
> from a single node with 4 mpi processes & 100G file size, I get 
> messages like this:
>
> Message from syslogd at c0 at Oct  6 21:12:43 ...
>  kernel:BUG: soft lockup - CPU#4 stuck for 23s! [ptlrpcd_00_04:32758]
>
> Message from syslogd at c0 at Oct  6 21:12:43 ...
>  kernel:BUG: soft lockup - CPU#5 stuck for 23s! [ptlrpcd_00_07:32761]
>
> Message from syslogd at c0 at Oct  6 21:12:43 ...
>  kernel:BUG: soft lockup - CPU#7 stuck for 23s! [socknal_sd00_03:32742]
>
> Message from syslogd at c0 at Oct  6 21:12:43 ...
>  kernel:BUG: soft lockup - CPU#9 stuck for 23s! [ptlrpcd_01_05:307]
>
> Message from syslogd at c0 at Oct  6 21:12:43 ...
>  kernel:BUG: soft lockup - CPU#14 stuck for 23s! [ptlrpcd_01_10:312]
>
> Message from syslogd at c0 at Oct  6 21:12:43 ...
>  kernel:BUG: soft lockup - CPU#15 stuck for 23s! [ptlrpcd_01_08:310]
>
> then my ssh session is terminated, and I'm unable to log back in again.
> running with 2 threads works just fine, so I'm guessing I'm dealing with
> some resource issue (probably memory).
>
> Any one have any idea?
>
> thanks,
> -john c
>
> p.s.
>
> fyi, for completeness, the Lustre server-side is 2 OSS & 2 MDS/MGS 
> failover pairs
> running 2.8.0 over ZFS, and appears to show no ill effects.
>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>




More information about the lustre-discuss mailing list