[Lustre-discuss] LBUG on lustre 1.8.0

Larry tsrjzq at gmail.com
Sun Nov 21 20:59:13 PST 2010


Unfortunately we haven't a serial console now, perhaps we can add one
per node, thanks a lot

On Mon, Nov 22, 2010 at 12:18 PM, Kevin Van Maren
<kevin.van.maren at oracle.com> wrote:
> Larry wrote:
>>
>> We add the "options libcfs libcfs_panic_on_lbug=1" in modprobe.conf to
>> make the server kernel panic ASAP the LBUG happened. Is there some way
>> to make the server dead a few seconds after the LBUG? We are also
>> puzzled with the message lost during the LBUG happened.
>>
>
> The messages should have gone to the console just fine (hopefully you are
> logging a serial console).
> If you are talking about /var/log/messages, then yes, it will be missing the
> final output as the
> messages don't have time to get written to disk on a kernel panic.
>
> Kevin
>
>
>> On Mon, Nov 22, 2010 at 10:42 AM, Kevin Van Maren
>> <Kevin.Van.Maren at oracle.com> wrote:
>>
>>>
>>> Sure, but I think for engineering to make progress on this bug, they are
>>> going to want a crash dump.  If you can enable crash dumps and panic on
>>> lbug
>>> (and if HA, increase dead timeout so it can complete the dump before
>>> being
>>> shot in the head) it would provide more info for the bug report.
>>>
>>> That being said, there are quite a few other bugs that have been fixed
>>> since
>>> 1.8.0, so you really should upgrade ASAP to 1.8.4.
>>>
>>> Kevin
>>>
>>>
>>> On Nov 21, 2010, at 6:59 PM, Larry <tsrjzq at gmail.com> wrote:
>>>
>>>
>>>>
>>>> We had a LBUG several days ago on our lustre 1.8.0. One OSS reported
>>>>
>>>> kernel: LustreError:
>>>> 24669:0:(service.c:1311:ptlrpc_server_handle_request())
>>>> ASSERTION(atomic_read(&(export)->exp_refcount) < 0x5a5a5a) failed
>>>> kernel: LustreError:
>>>> 24669:0:(service.c:1311:ptlrpc_server_handle_request()) LBUG
>>>> kernel: Lustre: 24669:0:(linux-debug.c:222:libcfs_debug_dumpstack())
>>>> showing stack for process 24669
>>>> ......
>>>>
>>>> I google for this, and find little information about it. It seems to
>>>> be a race condition on OSS, right? Should I open a bugzilla for this
>>>> LBUG?
>>>> Thanks.
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>
>



More information about the lustre-discuss mailing list