[Lustre-discuss] soft lockup on Lustre 1.6.2 + Ubuntu 2.6.15 patchless

Niklas Edmundsson Niklas.Edmundsson at hpc2n.umu.se
Tue Oct 9 05:15:15 PDT 2007


Hi all!

We managed to get a soft lockup on a lustre client when doing some 
stress testing. Clients are using Lustre 1.6.2 patchless client on 
the Ubuntu Dapper 2.6.15 kernel. From what I have understood the 
Ubuntu 2.6.15 kernel has the needed patch to be able to work with the 
patchless client.

As usual, I didn't find any similar in the bugzilla or by googling. 
Any hints on what's going wrong would be helpful.

This is the description I got on what was done:

Let 44 tasks (22 nodes with 2 tasks each) do:
rsync -a master rankNN
when finished, on another node do
rm -rf rank*
that node immediately did the following:

[696899.812091] BUG: soft lockup detected on CPU#0!
[696899.919769] CPU 0:
[696899.968179] Modules linked in: osc mgc lustre lov lquota mdc ksocklnd ptlrpc obdclass lnet lvfs libcfs nfs lockd sunrpc iptable_filter ip_tables openafs ipv6 autofs4 ext2 ext3 jbd md_mod ipmi_devintf ipmi_si ipmi_msghandler tg3 mx_driver mx_mcp hw_random psmouse i2c_amd756 shpchp pci_hotplug i2c_core pcspkr serio_raw evdev xfs exportfs dm_mod ide_generic ohci_hcd usbcore ide_cd cdrom ide_disk generic amd74xx thermal processor fan fbcon tileblit font bitblit softcursor capability commoncap
[696900.973641] Pid: 16696, comm: rm Tainted: P      2.6.15-29-amd64-server #1
[696901.133969] RIP: 0010:[<ffffffff801a8630>] <ffffffff801a8630>{__d_lookup+288}
[696901.298891] RSP: 0018:ffff8100a5f83c58  EFLAGS: 00000286
[696901.427423] RAX: ffff8100a7795250 RBX: 0000000000000000 RCX: 0000000000000014
[696901.595905] RDX: 000000000001d578 RSI: 00c399114511d578 RDI: ffff8100d1f0b238
[696901.763747] RBP: ffff81005c7f2e00 R08: 000000080585b155 R09: ffff81005b1ed000
[696901.931231] R10: 0000000000000050 R11: ffffffff801e3ac0 R12: 0000000188517366
[696902.098758] R13: 0000000000000010 R14: 0000000000000246 R15: 0000000000000283
[696902.266669] FS:  00002aaaaadfb6d0(0000) GS:ffffffff80453800(0000) knlGS:00000000556a9a90
[696902.456911] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[696902.593287] CR2: 0000000000511988 CR3: 00000000bc4c6000 CR4: 00000000000006e0
[696902.763134]
[696902.763135] Call Trace:<ffffffff88514035>{:ptlrpc:lock_res_and_lock+53} <ffffffff8019d3dc>{do_lookup+60}
[696903.017468]        <ffffffff8019df66>{__link_path_walk+2518} <ffffffff88660787>{:lustre:ll_readdir+3191}
[696903.241819]        <ffffffff8019e4f0>{link_path_walk+128} <ffffffff801a95a0>{update_atime+64}
[696903.441728]        <ffffffff8019eaf8>{path_lookup+440} <ffffffff8019ec9e>{__user_walk+62}
[696903.632782]        <ffffffff80197e06>{vfs_lstat+38} <ffffffff801a95a0>{update_atime+64}
[696903.823144]        <ffffffff801984af>{sys_newlstat+31} <ffffffff801a254e>{vfs_readdir+174}
[696904.015504]        <ffffffff801a2884>{sys_getdents64+180} <ffffffff8010fd82>{system_call+126}
[696904.217213]
[696904.873276] LustreError: 16698:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID  req at ffff81015a2d7600 x1868515/t0 o101->MGS at 130.239.78.233@tcp:26 lens 232/240 ref 1 fl Rpc:/0/0 rc 0/0
[696905.274439] LustreError: 16698:0:(client.c:519:ptlrpc_import_delay_req()) Skipped 1 previous similar message
[696906.232276] LustreError: 4194:0:(client.c:961:ptlrpc_expire_one_request()) @@@ timeout (sent at 1191925375, 100s ago)  req at ffff81018965b800 x1868505/t0 o250->MGS at 130.239.78.233@tcp:26 lens 304/328 ref 2 fl Rpc:/0/0 rc 0/-22
[696906.694129] LustreError: 4194:0:(client.c:961:ptlrpc_expire_one_request()) Skipped 7 previous similar messages


/Nikke
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se     |    nikke at hpc2n.umu.se
---------------------------------------------------------------------------
  KISS (:<) Keep It Simple Stupid
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=




More information about the lustre-discuss mailing list