[lustre-discuss] soft lockups on lustre client 2.8.0 or 2.10.0

john casu john at chiraldynamics.com
Mon Oct 9 06:45:35 PDT 2017


Thanks for this,

-john c.

On 10/9/17 3:28 AM, Deon Borman wrote:
> Hi John,
>
> You might be hitting LU-9230. There are some workarounds suggested in the comments on the jira page, that worked for us, after
> some tweaking.
>
> Regards
> Deon
>
> On 07/10/2017 05:28, john casu wrote:
>> with 2.8.0 or 2.10.0 client running on Centos 7.1511 (7.2), when I run IOR
>> from a single node with 4 mpi processes & 100G file size, I get messages like this:
>>
>> Message from syslogd at c0 at Oct  6 21:12:43 ...
>>  kernel:BUG: soft lockup - CPU#4 stuck for 23s! [ptlrpcd_00_04:32758]
>>
>> Message from syslogd at c0 at Oct  6 21:12:43 ...
>>  kernel:BUG: soft lockup - CPU#5 stuck for 23s! [ptlrpcd_00_07:32761]
>>
>> Message from syslogd at c0 at Oct  6 21:12:43 ...
>>  kernel:BUG: soft lockup - CPU#7 stuck for 23s! [socknal_sd00_03:32742]
>>
>> Message from syslogd at c0 at Oct  6 21:12:43 ...
>>  kernel:BUG: soft lockup - CPU#9 stuck for 23s! [ptlrpcd_01_05:307]
>>
>> Message from syslogd at c0 at Oct  6 21:12:43 ...
>>  kernel:BUG: soft lockup - CPU#14 stuck for 23s! [ptlrpcd_01_10:312]
>>
>> Message from syslogd at c0 at Oct  6 21:12:43 ...
>>  kernel:BUG: soft lockup - CPU#15 stuck for 23s! [ptlrpcd_01_08:310]
>>
>> then my ssh session is terminated, and I'm unable to log back in again.
>> running with 2 threads works just fine, so I'm guessing I'm dealing with
>> some resource issue (probably memory).
>>
>> Any one have any idea?
>>
>> thanks,
>> -john c
>>
>> p.s.
>>
>> fyi, for completeness, the Lustre server-side is 2 OSS & 2 MDS/MGS failover pairs
>> running 2.8.0 over ZFS, and appears to show no ill effects.
>>
>>
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>
>
>
>


More information about the lustre-discuss mailing list