[Lustre-discuss] 1.6.5.1 OSS crashes

Mag Gam magawake at gmail.com
Sun Jul 20 05:40:19 PDT 2008


I am trying to understand. What was the problem? How does SD_IOSTATS
affect the crash? How did you disable this?

Sorry for a newbie question....

TIA


On Sun, Jul 20, 2008 at 4:54 AM, Robin Humble
<rjh+lustre at cita.utoronto.ca> wrote:
> On Fri, Jul 18, 2008 at 09:02:36AM -0400, Brian J. Murrell wrote:
>>On Fri, 2008-07-18 at 05:52 -0400, Robin Humble wrote:
>>> Hi,
>>>
>>> I'm seeing coordinated OSS crashes with Lustre 1.6.5.1.
>>>
>>> our RHEL4 OSS have been stable for ~months with these kernels:
>>>   kernel-lustre-smp-2.6.9-67.0.4.EL_lustre.1.6.4.3
>>>   kernel-lustre-smp-2.6.9-55.0.9.EL_lustre.1.6.4.2
>>>
>>> but have crashed hard, twice, about 10hrs apart as soon as we started
>>> using this kernel:
>>>   kernel-lustre-smp-2.6.9-67.0.7.EL_lustre.1.6.5.1
>>Can you try rebuilding the kernel, disabling SD_IOSTATS?
>
> done. I rebuilt using the stock kernel's InfiniBand stack and
>  # CONFIG_SD_IOSTATS is not set
>
>  % cexec -p oss: uptime
> oss x17:  18:45:07 up 1 day, 30 min,  1 user,  load average: 4.97, 7.00, 6.27
> oss x18:  18:45:07 up 1 day, 23 min,  1 user,  load average: 4.18, 5.78, 5.71
> oss x19:  18:45:07 up 1 day, 23 min,  1 user,  load average: 5.18, 5.66, 4.60
>
> which is >> the 10hrs it was crashing at before.
> good guess about the cause of the problem! :-)
>
> maybe that rhel4 1.6.5.1 kernel rpm needs a respin then? seems like a
> fairly critical issue... :-/
>
> cheers,
> robin
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>



More information about the lustre-discuss mailing list