[Lustre-discuss] BUG on lustre patchless client 1.6.3; many small files

Christopher Mason cjmason at gmail.com
Fri Nov 16 20:43:44 PST 2007


I've had the following BUG using lustre patchless client 1.6.3 on
linux 2.6.20 (Fedora Core 6).  This hard-locked the machine; I was
unable to tell if there was a subsequent panic.  This was while
copying approx 2.7 TB from ext3 to lustre; it had copied about 2.6 TB;
I haven't verified if the data made it across okay.  There were a ton
(> 1 M) of tiny files in this copy (which took about 60 hours over
gigE); these cause a tremendous performance hit.  This is not at all
surprising, I just wonder if it's related to the bug.

I'm trying to get access to the lustre OSTs and MDTs and will post
logs if they exist.

I'm fairly new to lustre; are issues like this common when using a
somewhat odd kernel?

Thanks,

-c

Linux rome.mayo.edu 2.6.20-1.2952.fc6 #1 SMP Wed May 16 18:18:22 EDT
2007 x86_64 x86_64 x86_64 GNU/Linux


Nov 16 05:28:34 rome kernel: BUG: soft lockup detected on CPU#0!
Nov 16 05:28:34 rome kernel:
Nov 16 05:28:34 rome kernel: Call Trace:
Nov 16 05:28:34 rome kernel:  <IRQ>  [<ffffffff802b0bdb>]
softlockup_tick+0xdb/0
xf6
Nov 16 05:28:34 rome kernel:  [<ffffffff8028f5d0>] update_process_times
+0x42/0x6
8
Nov 16 05:28:34 rome kernel:  [<ffffffff80271f0c>]
smp_local_timer_interrupt+0x3
4/0x55
Nov 16 05:28:34 rome kernel:  [<ffffffff802725e8>]
smp_apic_timer_interrupt+0x51
/0x69
Nov 16 05:28:34 rome kernel:  [<ffffffff8025ace6>] apic_timer_interrupt
+0x66/0x7
0
Nov 16 05:28:34 rome kernel:  <EOI>  [<ffffffff8022bde9>]
dummy_inode_permission
+0x0/0x3
Nov 16 05:28:34 rome kernel:  [<ffffffff8020933c>] __d_lookup+0xdd/
0x110
Nov 16 05:28:34 rome kernel:  [<ffffffff8020ca8f>] do_lookup+0x2a/
0x1ae
Nov 16 05:28:34 rome kernel:  [<ffffffff80209c72>] __link_path_walk
+0x903/0xdb0
Nov 16 05:28:34 rome kernel:  [<ffffffff8020e78d>] link_path_walk
+0x55/0xd7
Nov 16 05:28:34 rome kernel:  [<ffffffff8020c8f7>] do_path_lookup
+0x1b5/0x217
Nov 16 05:28:34 rome kernel:  [<ffffffff802123d6>] getname+0x152/0x1b8
Nov 16 05:28:34 rome kernel:  [<ffffffff802237fb>] __user_walk_fd
+0x37/0x4c
Nov 16 05:28:34 rome kernel:  [<ffffffff8023dc58>] vfs_lstat_fd
+0x18/0x47
Nov 16 05:28:34 rome kernel:  [<ffffffff8022a50f>] sys_newlstat
+0x19/0x31
Nov 16 05:28:34 rome kernel:  [<ffffffff8025a231>] tracesys+0x71/0xe1
Nov 16 05:28:34 rome kernel:  [<ffffffff8025a29c>] tracesys+0xdc/0xe1
Nov 16 05:28:34 rome kernel:
Nov 16 05:30:04 rome kernel: LustreError: 19267:0:(client.c:
969:ptlrpc_expire_on
e_request()) @@@ timeout (sent at 1195212504, 100s ago)
req at ffff8100bd777a00 x6
6226136/t0 o4->protfs-OST0003_UUID at 129.176.249.193@tcp:28 lens 384/352
ref 2 fl
Rpc:/0/0 rc 0/-22
Nov 16 05:30:04 rome kernel: Lustre: protfs-OST0003-osc-
ffff8100e50a5c00: Connec
tion to service protfs-OST0003 via nid 129.176.249.193 at tcp was lost;
in progress
 operations using this service will wait for recovery to complete.
Nov 16 05:30:09 rome kernel: LustreError: 19267:0:(client.c:
969:ptlrpc_expire_on
e_request()) @@@ timeout (sent at 1195212509, 100s ago)
req at ffff8100c98afa00 x6
6226138/t0 o4->protfs-OST0001_UUID at 129.176.249.201@tcp:28 lens 384/352
ref 3 fl
Rpc:/0/0 rc 0/-22


etc, etc.




More information about the lustre-discuss mailing list