[lustre-discuss] nodes crash during ior test

Brian Andrus toomuchit at gmail.com
Fri Aug 4 10:12:59 PDT 2017


All,

I am trying to run some ior benchmarking on a small system.

It only has 2 OSSes.
I have been having some trouble where one of the clients will reboot and 
do a crash dump somewhat arbitrarily. The runs will work most of the 
time, but every 5 or so times, a client reboots and it is not always the 
same client.

The call trace seems to point to lnet:


72095.973865] Call Trace:
[72095.973892]  [<ffffffffa070e856>] ? cfs_percpt_unlock+0x36/0xc0 [libcfs]
[72095.973936]  [<ffffffffa0779851>] 
lnet_return_tx_credits_locked+0x211/0x480 [lnet]
[72095.973973]  [<ffffffffa076c770>] lnet_msg_decommit+0xd0/0x6c0 [lnet]
[72095.974006]  [<ffffffffa076d0f9>] lnet_finalize+0x1e9/0x690 [lnet]
[72095.974037]  [<ffffffffa06baf45>] ksocknal_tx_done+0x85/0x1c0 [ksocklnd]
[72095.974068]  [<ffffffffa06c3277>] ksocknal_handle_zcack+0x137/0x1e0 
[ksocklnd]
[72095.974101]  [<ffffffffa06becf1>] 
ksocknal_process_receive+0x3a1/0xd90 [ksocklnd]
[72095.974134]  [<ffffffffa06bfa6e>] ksocknal_scheduler+0xee/0x670 
[ksocklnd]
[72095.974165]  [<ffffffff810b1b20>] ? wake_up_atomic_t+0x30/0x30
[72095.974193]  [<ffffffffa06bf980>] ? ksocknal_recv+0x2a0/0x2a0 [ksocklnd]
[72095.974222]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
[72095.974244]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
[72095.974272]  [<ffffffff81697758>] ret_from_fork+0x58/0x90
[72095.974296]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140

I am currently using lustre 2.9.59_15_g107b2cb built for kmod

Is there something I can do to track this down and hopefully remedy it?

Brian Andrus



More information about the lustre-discuss mailing list