[lustre-discuss] nodes crash during ior test
Brian Andrus
toomuchit at gmail.com
Fri Aug 4 10:12:59 PDT 2017
All,
I am trying to run some ior benchmarking on a small system.
It only has 2 OSSes.
I have been having some trouble where one of the clients will reboot and
do a crash dump somewhat arbitrarily. The runs will work most of the
time, but every 5 or so times, a client reboots and it is not always the
same client.
The call trace seems to point to lnet:
72095.973865] Call Trace:
[72095.973892] [<ffffffffa070e856>] ? cfs_percpt_unlock+0x36/0xc0 [libcfs]
[72095.973936] [<ffffffffa0779851>]
lnet_return_tx_credits_locked+0x211/0x480 [lnet]
[72095.973973] [<ffffffffa076c770>] lnet_msg_decommit+0xd0/0x6c0 [lnet]
[72095.974006] [<ffffffffa076d0f9>] lnet_finalize+0x1e9/0x690 [lnet]
[72095.974037] [<ffffffffa06baf45>] ksocknal_tx_done+0x85/0x1c0 [ksocklnd]
[72095.974068] [<ffffffffa06c3277>] ksocknal_handle_zcack+0x137/0x1e0
[ksocklnd]
[72095.974101] [<ffffffffa06becf1>]
ksocknal_process_receive+0x3a1/0xd90 [ksocklnd]
[72095.974134] [<ffffffffa06bfa6e>] ksocknal_scheduler+0xee/0x670
[ksocklnd]
[72095.974165] [<ffffffff810b1b20>] ? wake_up_atomic_t+0x30/0x30
[72095.974193] [<ffffffffa06bf980>] ? ksocknal_recv+0x2a0/0x2a0 [ksocklnd]
[72095.974222] [<ffffffff810b0a4f>] kthread+0xcf/0xe0
[72095.974244] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
[72095.974272] [<ffffffff81697758>] ret_from_fork+0x58/0x90
[72095.974296] [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
I am currently using lustre 2.9.59_15_g107b2cb built for kmod
Is there something I can do to track this down and hopefully remedy it?
Brian Andrus
More information about the lustre-discuss
mailing list