[lustre-discuss] OST with failover.mode=failout not failing out

Tue Apr 28 16:10:27 PDT 2020

Hello all,

I'm currently running 2.13.0 on Debian Buster with ZFS osds. My current
setup is a simple cluster with all the components on the same node. Though
the OST is  marked as "failout", operations are still hanging indefinitely
when they should fail after a timeout.

Predictably, I get the following error when It try and `touch` a file on
the missing OST:
Lustre: 16507:0:(client.c:2219:ptlrpc_expire_one_request()) @@@ Request
sent has timed out for slow reply: [sent 1588114247/real 1588114247]
 req at 00000000978c5ab1 x1665257677278528/t0(0)
o2->foobar-OST0000-osc-ffff8e9de9263800 at 192.168.7.229@tcp1:28/4 lens
440/432 e 0 to 1 dl 1588114254 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:''

Then there is the hung `touch` task in the kernel as well:
[ 1177.541894] LustreError: 49487:0:(mgs_llog.c:4313:mgs_write_log_param())
err -22 on param 'sys.timeout'
[ 1177.542627] LustreError: 49487:0:(mgs_handler.c:1032:mgs_iocontrol())
MGS: setparam err: rc = -22
[ 1209.388728] INFO: task touch:48422 blocked for more than 120 seconds.
[ 1209.389779]       Tainted: P           O      4.19.0-9-amd64 #1 Debian
4.19.98-1
[ 1209.390636] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 1209.391482] touch           D    0 48422  48421 0x00000000
[ 1209.391485] Call Trace:
[ 1209.391495]  ? __schedule+0x2a2/0x870
[ 1209.391497]  schedule+0x28/0x80
[ 1209.391499]  schedule_timeout+0x26d/0x390
[ 1209.391646]  ? ptlrpc_set_add_new_req+0x100/0x180 [ptlrpc]
[ 1209.391649]  wait_for_completion+0x11f/0x190
[ 1209.391655]  ? wake_up_q+0x70/0x70
[ 1209.391688]  osc_io_setattr_end+0xcf/0x1f0 [osc]
[ 1209.391710]  ? lov_io_iter_fini_wrapper+0x40/0x40 [lov]
[ 1209.391771]  cl_io_end+0x53/0x130 [obdclass]
[ 1209.391781]  lov_io_end_wrapper+0xc3/0xd0 [lov]
[ 1209.391787]  lov_io_call.isra.10+0x7d/0x130 [lov]
[ 1209.391793]  lov_io_end+0x32/0xd0 [lov]
[ 1209.391822]  cl_io_end+0x53/0x130 [obdclass]
[ 1209.391851]  cl_io_loop+0xea/0x1b0 [obdclass]
[ 1209.391917]  cl_setattr_ost+0x278/0x300 [lustre]
[ 1209.391931]  ll_setattr_raw+0xe9b/0xf50 [lustre]
[ 1209.391936]  notify_change+0x2df/0x440
[ 1209.391939]  utimes_common.isra.1+0xdf/0x1b0
[ 1209.391942]  ? __check_object_size+0x162/0x173
[ 1209.391943]  do_utimes+0x13c/0x160
[ 1209.391945]  __x64_sys_utimensat+0x7a/0xc0
[ 1209.391952]  ? lov_read_and_clear_async_rc+0x178/0x310 [lov]
[ 1209.391957]  do_syscall_64+0x53/0x110
[ 1209.391961]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1209.391963] RIP: 0033:0x7f290224a2d3
[ 1209.391968] Code: Bad RIP value.
[ 1209.391969] RSP: 002b:00007ffc62383408 EFLAGS: 00000246 ORIG_RAX:
0000000000000118
[ 1209.391971] RAX: ffffffffffffffda RBX: 00007ffc623848f0 RCX:
00007f290224a2d3
[ 1209.391972] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
0000000000000000
[ 1209.391972] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000001
[ 1209.391973] R10: 0000000000000000 R11: 0000000000000246 R12:
00000000ffffff9c
[ 1209.391974] R13: 0000000000000000 R14: 0000000000000001 R15:
0000000000000000

It hangs indefinitely, repeating this error until the OST is reattached.

Initially, the OST was not created with "failover.mode=failout", and was
modified with `mkfs.lustre --param failover.mode=failout ZFS/DATASET`
followed by a `--writeconf` after the logs errored and said it would be
necessary.

I have tried bringing down the clustre entirely and then back up, but the
behavior persists.

Am I missing something? Maybe an lctl parameter, or a mount option? Perhaps
this is a known issue and the OST must be initially formatted with this
option? So far the only resource I've found is on page 107-108 of the
manual outlining how to do this.

Thanks for your time and assistance,
Christian

-- 
 <https://opendrives.com/wp-content/uploads/2020/04/OD-Anywhere.pdf>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200428/cec7b791/attachment.html>