[Lustre-discuss] OSTs hanging while running IOR

Brian J. Murrell Brian.Murrell at Sun.COM
Wed Sep 9 10:41:01 PDT 2009


On Wed, 2009-09-09 at 14:31 -0300, Rafael David Tinoco wrote:
> Have anyone seen these kind of errors while running IOR or some other
> benchmarks:

On a note of e-mail formatting, so much vertical whitespace is not
really needed and makes reading a bit more difficult.

Also, personally, I don't wrap log file excerpts at ~80 columns.  I
think most people have a wide enough display to read that and it makes
reading things like stack dumps much, much easier.  Not MTAs make it all
that easy to not wrap though.

> One of my OSSs crashes,

What do you mean by "crash"?  Does it oops, or need a reboot, etc?  You
have not really provided enough log for me to determine what context the
following is in:

>  sometimes one, sometimes another. With the following error:
> 
>  
> 
> Sep  9 07:43:40 a01n00 kernel: ll_ost_io_64  D ffff81037fea80c0     0
> 20381      1         20382 20380 (L-TLB)
> 
> Sep  9 07:43:40 a01n00 kernel:  ffff81036316b510 0000000000000046
> 0000000000000003 0000040000000282
> 
> Sep  9 07:43:40 a01n00 kernel:  0000000000000100 0000000000000009
> ffff81037ac09100 ffff81037fea80c0
> 
> Sep  9 07:43:40 a01n00 kernel:  0000088160738e93 0000000000313ec1
> ffff81037ac092e8 0000000328b65740
> 
> Sep  9 07:43:40 a01n00 kernel: Call Trace:
> 
> Sep  9 07:43:40 a01n00 kernel:  [<ffffffff80033608>] submit_bio
> +0xcd/0xd4
> 
> Sep  9 07:43:40 a01n00 kernel:
> [<ffffffff88b14aac>] :obdfilter:filter_do_bio+0x95c/0xb60
> 
> Sep  9 07:43:40 a01n00 kernel:
> [<ffffffff88ae0f24>] :fsfilt_ldiskfs:fsfilt_ldiskfs_write_record
> +0x464/0x4b0
> 
> Sep  9 07:43:40 a01n00 kernel:
> [<ffffffff88b014f0>] :obdfilter:filter_commit_cb+0x0/0x2d0
> 
> Sep  9 07:43:40 a01n00 kernel:
> [<ffffffff88031749>] :jbd:journal_callback_set+0x2d/0x47
> 
> Sep  9 07:43:40 a01n00 kernel:  [<ffffffff8009daef>]
> autoremove_wake_function+0x0/0x2e
...

Can you provide a bit more of the log before the above so we can see
what the stack trace is in reference to?  Also, try to eliminate the
white-space between lines.  Are you getting any other errors or messages
from Lustre prior to that?

Perhaps you are getting some messages saying that various operations are
"slow"?

Have you tuned these OSSes with respect to the number of OST threads
needed to drive (and not over-drive) your disks?  The lustre iokit is
useful for that tuning.

b.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090909/31d30a37/attachment.pgp>


More information about the lustre-discuss mailing list