[Lustre-devel] LNetPoll undefined
Liang Zhen
liang at whamcloud.com
Sun Mar 6 17:35:29 PST 2011
Ravi,
I think the soft lockup probably is because the thread is polling on the EQ and expecting EVENT_PUT, however, there are a lot of EVENT_SEND(server keeps calling LNetPut or LNetGet with MD on the same EQ?) so it became to a busy loop which is always trying to get LNET_LOCK to poll new event, and kernel can't schedule watchdog on some CPUs then raise the warning.
pseudo code for eq_callback is like
wait_queue_head_t my_waitq;
struct list_head req_list;
void my_callback(lnet_event_t *ev)
{
if (ev->type != LNET_EVENT_PUT)
return;
/* construct request form data in MD */
req = ....;
...
add_req_to_queue(req, req_list);
wake_up(my_waitq);
}
my_thread()
{
...
rc = LNetEQAlloc(1024, my_callback, &eqh);
...
while (1) {
while (!list_empty(&req_list)) {
req = list_entry(req_list.next, ...);
list_del(&req->list);
handle_request(req);
}
init_waitqueue_entry(wait, current);
add_wait_queue(my_waitq, wait)
if (list_empty(&req_list))
schedule();
remove_wait_queue(my_waitq, wait);
}
}
NB: this is just pseudo code and there should be some locks to protect, if you want some real code that is using eq_callback, please lookup into lnet/selftest/rpc.c
Yes, LNetEQPoll is not exported... so if you really want to use it, please just add this line to lnet/lnet/module.c:
EXPORT_SYMBOL(LNetEQPoll);
Though they should be exact samely for your case.
Regards
Liang
On Mar 6, 2011, at 2:51 AM, Ravi wrote:
> Hello
> Thanks for the reply.I am using lustre-1.8.1.1 .I am working on a new module which is used for some kind of delegation operations.I am using Lnet operations in this module.
>
>
> The code which fails is
> do {
> rc = LNetEQWait(lnet_eq_hd, &ev);
> if( ev.type == LNET_EVENT_PUT )
> break;
>
> } while ( rc != 0);
>
>
>
>
> Here i am waiting on some PUT event from a client and then break from the loop.And do some operations accordingly.But next time when i perform some PUT operation (for example) and it gets logged into the event queue i try reading from that event but the MDS fails.
> I also tried using these functions //rc = LNetEQPoll( &lnet_eq_hd, 1,2000, &ev, &which); in place of LNetEQWait but it says undefined .
>
> Can you please throw some light on eq_callback function as i havnt found it in Lnet manual to go through.
>
>
> The log before crashing shows :
>
>
> Mar 4 20:35:08 ws11 kernel: type=LNET_EVENT_SEND, pt-idx=53, mbits=0x1234abcd, rlen=64, mlen=64, md.user_ptr=0xaaaabbbb, hdr-data=0x0
> Mar 4 20:35:08 ws11 kernel: status=0, unlnk=0, offset=0, seq=2
>
> //Iam asssuming it fails here as till here it prints fine.I also want to mention that this operation is successful as well.
>
>
> Mar 4 20:35:27 ws11 kernel: BUG: soft lockup - CPU#0 stuck for 10s! [insmod:4609]
> Mar 4 20:35:27 ws11 kernel: CPU 0:
> Mar 4 20:35:27 ws11 kernel: Modules linked in: tmod(U) ksocklnd(U) ko2iblnd(FU) lnet(U) libcfs(U) autofs4(U) hidp(U) nfs(U) fscache(U) nfs_acl(U) rfcomm(U) l2cap(U) bluetooth(U) lockd(U) sunrpc(U) cpufreq_ondemand(U) acpi_cpufreq(U) freq_table(U) ip_conntrack_netbios_ns(U) ipt_REJECT(U) xt_state(U) ip_conntrack(U) nfnetlink(U) iptable_filter(U) ip_tables(U) ip6t_REJECT(U) xt_tcpudp(U) ip6table_filter(U) ip6_tables(U) x_tables(U) rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) ib_uverbs(U) ib_umad(U) mlx4_en(U) mlx4_ib(U) mlx4_core(U) loop(U) dm_multipath(U) scsi_dh(U) video(U) hwmon(U) backlight(U) sbs(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) lp(U) snd_hda_intel(U) snd_seq_dummy(U) snd_seq_oss(U) snd_seq_midi_event(U) snd_seq(U) snd_seq_device(U) snd_pcm_oss(U) snd_mixer_oss(U) ib_mthca(U) snd_pcm(U) snd_timer(U) snd_page_alloc(U) ib_mad(U) snd_hwdep(U) snd(U) sg(U) ib_core(U) e100(U) ide_cd(
> Mar 4 20:35:27 ws11 kernel: ) mii(U) serio_raw(U) pcspkr(U) i2c_i801(U) cdrom(U) soundcore(U) parport_pc(U) shpchp(U) i2c_core(U) parport(U) dm_raid45(U) dm_message(U) dm_region_hash(U) dm_mem_cache(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_log(U) dm_mod(U) ata_piix(U) libata(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U)
> Mar 4 20:35:27 ws11 kernel: Pid: 4609, comm: insmod Tainted: GF 2.6.18-128.7.1.el5-lustre.1.8.1.1smp-cust #2
> Mar 4 20:35:27 ws11 kernel: RIP: 0010:[<ffffffff80064c54>] [<ffffffff80064c54>] .text.lock.spinlock+0x2/0x30
> Mar 4 20:35:27 ws11 kernel: RSP: 0018:ffff810021099d10 EFLAGS: 00000286
> Mar 4 20:35:27 ws11 kernel: RAX: 0000000000000002 RBX: 00000000ffffffff RCX: ffff810021099df8
> Mar 4 20:35:27 ws11 kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff8888a7a0
> Mar 4 20:35:27 ws11 kernel: RBP: ffff81003a04e1c0 R08: ffff810021099ddc R09: 000000001234abcd
> .......
>
>
>
> I hope this helps
>
> Thanks
>
>
>
>
>
> -----Original Message-----
> From: Liang Zhen <liang at whamcloud.com>
> To: Ravi <raviprakashdrbh at aol.com>
> Cc: lustre-devel <lustre-devel at lists.lustre.org>
> Sent: Fri, Mar 4, 2011 8:54 pm
> Subject: Re: [Lustre-devel] LNetPoll undefined
>
> Hi Ravi,
>
> Which version of Lustre/LNet are you trying with? Are you trying to build some new code over LNet? Could you show us some example code if you don't mind?
> btw, If you are trying this in kernel space, I would suggest to use eq_callback (LNetEQAlloc(...eq_callback)) instead of LNetEQPoll/LNetEQWait, which is better for performance. Polling is not good for performance because all EQs share one single waitq in LNet.
>
> Regards
> Liang
>
> On Mar 5, 2011, at 9:30 AM, Ravi wrote:
>
>> Hello
>>
>> I am using LNetWait (blocking call ) on a particular event .After i recevie this event i break from the loop which waits for this event and proceed but when another event is added into the event queue the system crashes.I thought LNetPoll would be better as i can just poll for that particular event without disturbing the event queue but when i make i get undefined.Any thoughts .
>>
>> Thanks
>> Ravi
>>
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20110307/878da137/attachment.htm>
More information about the lustre-devel
mailing list