[Lustre-discuss] Questions from a Lustre newbie

Robert Healey healer at rpi.edu
Fri Jul 18 13:47:51 PDT 2008


Greetings.

I'm starting to investigate Lustre and see if it would work in my 
situation or if its a solution in search of a problem.  I've currently 
got 136 Sun V20zs running RHEL 5.2 and 26TB split between two other Sun 
Opteron systems.  I'm adding 4 Sun x4500s @ 48T each and 129 8 way Xeon 
systems to the cluster.

Doing some testing using one of the x4500s as the OSS with 6 OSTs and 
one of the v20zs acting as MGS/MDT and the test client, I keep getting 
stack dumps.  What I am attempting to do, as a load test is to rsync the 
  entire set of NFS exported folders from the existing 26T file servers 
to the lustre file system.  After about 500G copy, I get a stack dump on 
the v20z and the rsync hangs (it can be killed).  I know with RHEL4.2 on 
the v20z, if I ran it with the SMP kernel and used it as an NFS server 
it would do a full kernel panic under load.  That was reproducible 
across the entire cluster, but I have not had that problem yet with RHEL 
5.2.

Thank you for your time and any words of advice.

Bob Healey

Except from dmesg:
Jul 17 16:32:42 compute-4-10 kernel: LustreError: 
3834:0:(ldlm_lock.c:430:__ldlm_handle2lock()) ASSERTION(lock->l_resource 
!= NULL)
failed
Jul 17 16:32:42 compute-4-10 kernel: LustreError: 
3834:0:(tracefile.c:432:libcfs_assertion_failed()) LBUG
Jul 17 16:32:42 compute-4-10 kernel: Lustre: 
3834:0:(linux-debug.c:167:libcfs_debug_dumpstack()) showing stack for 
process 3834
Jul 17 16:32:42 compute-4-10 kernel: ldlm_cn_06    R  running task 
  0  3834      1          3835  3833 (L-TLB)
Jul 17 16:32:42 compute-4-10 kernel:  ffff810034917e50 0000000000000046 
ffff810034da55c8 ffffffff8006b6c9
Jul 17 16:32:42 compute-4-10 kernel:  ffff810040ee99c0 ffffffff88626771 
ffff810034da5400 ffff810034da54e0
Jul 17 16:32:42 compute-4-10 kernel:  ffff81003f5c1d40 ffffffff88624456 
ffff810034da5588 0000000000000000
Jul 17 16:32:42 compute-4-10 kernel: Call Trace:
Jul 17 16:32:42 compute-4-10 kernel:  [<ffffffff8006b6c9>] 
do_gettimeofday+0x50/0x92
Jul 17 16:32:42 compute-4-10 kernel:  [<ffffffff88624456>] 
:libcfs:lcw_update_time+0x16/0x100
Jul 17 16:32:42 compute-4-10 kernel:  [<ffffffff800868b0>] 
__wake_up_common+0x3e/0x68
Jul 17 16:32:42 compute-4-10 kernel:  [<ffffffff887770bc>] 
:ptlrpc:ptlrpc_main+0xdcc/0xf50
Jul 17 16:32:42 compute-4-10 kernel:  [<ffffffff80088432>] 
default_wake_function+0x0/0xe
Jul 17 16:32:42 compute-4-10 kernel:  [<ffffffff8005bfb1>] 
child_rip+0xa/0x11
Jul 17 16:32:43 compute-4-10 kernel:  [<ffffffff887762f0>] 
:ptlrpc:ptlrpc_main+0x0/0xf50
Jul 17 16:32:43 compute-4-10 kernel:  [<ffffffff8005bfa7>] 
child_rip+0x0/0x11
Jul 17 16:32:43 compute-4-10 kernel:
Jul 17 16:32:43 compute-4-10 kernel: LustreError: dumping log to 
/tmp/lustre-log.1216326763.3834


-- 
Bob Healey
Systems Administrator
Physics Department, RPI
healer at rpi.edu




More information about the lustre-discuss mailing list