[lustre-discuss] soft lockups on lustre client 2.8.0 or 2.10.0

john casu john at chiraldynamics.com
Fri Oct 6 20:28:47 PDT 2017


with 2.8.0 or 2.10.0 client running on Centos 7.1511 (7.2), when I run IOR
from a single node with 4 mpi processes & 100G file size, I get messages like this:

Message from syslogd at c0 at Oct  6 21:12:43 ...
  kernel:BUG: soft lockup - CPU#4 stuck for 23s! [ptlrpcd_00_04:32758]

Message from syslogd at c0 at Oct  6 21:12:43 ...
  kernel:BUG: soft lockup - CPU#5 stuck for 23s! [ptlrpcd_00_07:32761]

Message from syslogd at c0 at Oct  6 21:12:43 ...
  kernel:BUG: soft lockup - CPU#7 stuck for 23s! [socknal_sd00_03:32742]

Message from syslogd at c0 at Oct  6 21:12:43 ...
  kernel:BUG: soft lockup - CPU#9 stuck for 23s! [ptlrpcd_01_05:307]

Message from syslogd at c0 at Oct  6 21:12:43 ...
  kernel:BUG: soft lockup - CPU#14 stuck for 23s! [ptlrpcd_01_10:312]

Message from syslogd at c0 at Oct  6 21:12:43 ...
  kernel:BUG: soft lockup - CPU#15 stuck for 23s! [ptlrpcd_01_08:310]

then my ssh session is terminated, and I'm unable to log back in again.
running with 2 threads works just fine, so I'm guessing I'm dealing with
some resource issue (probably memory).

Any one have any idea?

thanks,
-john c

p.s.

fyi, for completeness, the Lustre server-side is 2 OSS & 2 MDS/MGS failover pairs
running 2.8.0 over ZFS, and appears to show no ill effects.




More information about the lustre-discuss mailing list