[Lustre-discuss] Questions from a Lustre newbie
Robert Healey
healer at rpi.edu
Fri Jul 18 13:47:51 PDT 2008
Greetings.
I'm starting to investigate Lustre and see if it would work in my
situation or if its a solution in search of a problem. I've currently
got 136 Sun V20zs running RHEL 5.2 and 26TB split between two other Sun
Opteron systems. I'm adding 4 Sun x4500s @ 48T each and 129 8 way Xeon
systems to the cluster.
Doing some testing using one of the x4500s as the OSS with 6 OSTs and
one of the v20zs acting as MGS/MDT and the test client, I keep getting
stack dumps. What I am attempting to do, as a load test is to rsync the
entire set of NFS exported folders from the existing 26T file servers
to the lustre file system. After about 500G copy, I get a stack dump on
the v20z and the rsync hangs (it can be killed). I know with RHEL4.2 on
the v20z, if I ran it with the SMP kernel and used it as an NFS server
it would do a full kernel panic under load. That was reproducible
across the entire cluster, but I have not had that problem yet with RHEL
5.2.
Thank you for your time and any words of advice.
Bob Healey
Except from dmesg:
Jul 17 16:32:42 compute-4-10 kernel: LustreError:
3834:0:(ldlm_lock.c:430:__ldlm_handle2lock()) ASSERTION(lock->l_resource
!= NULL)
failed
Jul 17 16:32:42 compute-4-10 kernel: LustreError:
3834:0:(tracefile.c:432:libcfs_assertion_failed()) LBUG
Jul 17 16:32:42 compute-4-10 kernel: Lustre:
3834:0:(linux-debug.c:167:libcfs_debug_dumpstack()) showing stack for
process 3834
Jul 17 16:32:42 compute-4-10 kernel: ldlm_cn_06 R running task
0 3834 1 3835 3833 (L-TLB)
Jul 17 16:32:42 compute-4-10 kernel: ffff810034917e50 0000000000000046
ffff810034da55c8 ffffffff8006b6c9
Jul 17 16:32:42 compute-4-10 kernel: ffff810040ee99c0 ffffffff88626771
ffff810034da5400 ffff810034da54e0
Jul 17 16:32:42 compute-4-10 kernel: ffff81003f5c1d40 ffffffff88624456
ffff810034da5588 0000000000000000
Jul 17 16:32:42 compute-4-10 kernel: Call Trace:
Jul 17 16:32:42 compute-4-10 kernel: [<ffffffff8006b6c9>]
do_gettimeofday+0x50/0x92
Jul 17 16:32:42 compute-4-10 kernel: [<ffffffff88624456>]
:libcfs:lcw_update_time+0x16/0x100
Jul 17 16:32:42 compute-4-10 kernel: [<ffffffff800868b0>]
__wake_up_common+0x3e/0x68
Jul 17 16:32:42 compute-4-10 kernel: [<ffffffff887770bc>]
:ptlrpc:ptlrpc_main+0xdcc/0xf50
Jul 17 16:32:42 compute-4-10 kernel: [<ffffffff80088432>]
default_wake_function+0x0/0xe
Jul 17 16:32:42 compute-4-10 kernel: [<ffffffff8005bfb1>]
child_rip+0xa/0x11
Jul 17 16:32:43 compute-4-10 kernel: [<ffffffff887762f0>]
:ptlrpc:ptlrpc_main+0x0/0xf50
Jul 17 16:32:43 compute-4-10 kernel: [<ffffffff8005bfa7>]
child_rip+0x0/0x11
Jul 17 16:32:43 compute-4-10 kernel:
Jul 17 16:32:43 compute-4-10 kernel: LustreError: dumping log to
/tmp/lustre-log.1216326763.3834
--
Bob Healey
Systems Administrator
Physics Department, RPI
healer at rpi.edu
More information about the lustre-discuss
mailing list