[Lustre-discuss] Odd performance issue with 1.4.x OSS ...
Andreas Dilger
adilger at sun.com
Mon Nov 3 09:47:41 PST 2008
On Oct 31, 2008 13:02 -0700, Steden Klaus wrote:
> Our Lustre started exhibiting some curious performance issues today
> ... basically, it slowed down dramatically and reliable I/O performance
> became impossible. I looked through the output of dmesg and saw a number
> of kernel 'oops' messages, but not being a Lustre kernel expert, I'm
> not exactly sure what they indicate. I stopped the OSTs on the node in
> question and ran e2fsck on the OST drives, but they've come up clean so
> I don't think it's a hardware problem. I don't have physical access to
> the machine right now so it may in fact be something on the back end,
> but I'm working on verifying that with a technician on site. In the
> meantime ... can anyone help decipher this for me? There are a couple
> of messages like it:
These kind of messages are of relatively little use unless they include
some of the preceding lines. Are you sure this is an oops, and not a
watchdog timeout that is dumping the stack?
> -- cut --
> ll_ost_215 S 00000100d2141808 0 8584 1 8585 8583 (L-TLB)
> 00000101184233e8 0000000000000046 000000000000000f ffffffffa059c3b8
> 00000000005c2616 0000000100000000 0000000000000000 00000100d21418b0
> 0000000000000013 0000000000000000
> Call Trace:<ffffffffa059c3b8>{:ptlrpc:ptl_send_buf+824} <ffffffff801454bd>{__mod_timer+317}
> <ffffffff8033860d>{schedule_timeout+381} <ffffffff801460a0>{process_timeout+0}
> <ffffffffa0596e84>{:ptlrpc:ptlrpc_queue_wait+6932}
This looks like a network problem, but hard to say without more information.
If you are a supported customer, you will have better service by filing a
bugzilla bug. This list only gets "as available" replies and that is
doubly true for old 1.4 Lustre installations.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
More information about the lustre-discuss
mailing list