[Lustre-discuss] ras_stride_increase_window() ASSERTION failed
di wang
di.wang at oracle.com
Sun Jun 20 22:40:18 PDT 2010
Christopher J.Walker wrote:
> Christopher J.Walker wrote:
>
>> Tom.Wang wrote:
>>
>>> Hello,
>>>
>>> you need the patch in bug 17197, attachment
>>>
>>> https://bugzilla.lustre.org/attachment.cgi?id=28672
>>>
>>> and probably also the patch in
>>>
>>> https://bugzilla.lustre.org/show_bug.cgi?id=22385
>>>
>>>
>> Thanks for the very quick reply.
>>
>> I've recompiled the patchless client with both these patches and have
>> installed it on our machines. I've been running a test for the last 6
>> hours, and initial signs are very good - no repeat of the error message
>> on any of the machines.
>>
>>
>
> I subsequently upgraded to 1.8.3 on the clients. Whilst I didn't see
> problems, one of the users is complaining about poor performance (it's
> possible this has other causes, but the timing is suspicious).
>
> Both patches are labelled "johann: landed1.8.3+"
>
> I'm confused about whether this means the bugs are fixed in 1.8.3 or not.
>
> Bug 22385 is mentioned in the changelog as being fixed (and attempting
> to apply the patch causes a reject).
>
> Bug 17197 isn't mentioned in the changelog, and applying the patch
> mentioned:
> https://bugzilla.lustre.org/attachment.cgi?id=28672
> results in 2 hunks reversed and 3 applied.
>
> Should I downgrade to my 1.8.2 version? Apply the remaining 3 hunks for
> bug 17197 or something else?
>
Yes, these fixes has been landed in 1.8.3. so you do not need downgrade
to 1.8.2.
Thanks
WangDi
> Thanks again,
>
> Chris
>
>
>> Chris
>>
>>
>>> Thanks
>>> WangDi
>>>
>>>
>>> Christopher J. Walker wrote:
>>>
>>>> I see the following error in the logs on some of my lustre clients:
>>>>
>>>> Mar 29 20:58:43 cn507 kernel: LustreError:
>>>> 18750:0:(rw.c:1948:ras_stride_increase_window())
>>>> ASSERTION(ras->ras_window_
>>>> start + ras->ras_window_len >= ras->ras_stride_offset) failed:
>>>> window_start 1792, window_len 0 stride_offset 2017
>>>>
>>>> Several processes seem to be blocking on this machine in state DN.
>>>>
>>>> Is this a known issue? I've looked in bugzilla and not found anything
>>>> obvious (but this is the first time I've looked in your bugzilla).
>>>> I've found
>>>> http://www.nersc.gov/hypermail/nersc-io/att-0612/summary.pdf and had a
>>>> quick flick through, but it refers to mpi-io, which we are not doing,
>>>> and a 1.6 kernel, whereas we are running 1.8.
>>>>
>>>> I'm running 1.8.2 servers (downloaded from Sun/Oracle), and 1.8.2
>>>> clients compiled from source on a Scientific Linux 2.6.18-164.15.1.el5
>>>> kernel.
>>>>
>>>> /var/log/messages says:
>>>>
>>>>
>>>>> Mar 29 20:58:43 cn507 kernel: LustreError:
>>>>> 18750:0:(rw.c:1948:ras_stride_increase_window())
>>>>> ASSERTION(ras->ras_window_
>>>>> start + ras->ras_window_len >= ras->ras_stride_offset) failed:
>>>>> window_start 1792, window_len 0 stride_offset 2017
>>>>> Mar 29 20:58:43 cn507 kernel: LustreError:
>>>>> 18750:0:(rw.c:1948:ras_stride_increase_window()) LBUG
>>>>> Mar 29 20:58:43 cn507 kernel: Pid: 18750, comm: athena.py
>>>>> Mar 29 20:58:43 cn507 kernel: Mar 29 20:58:43 cn507 kernel: Call Trace:
>>>>> Mar 29 20:58:43 cn507 kernel: [<ffffffff8844d6a1>]
>>>>> libcfs_debug_dumpstack+0x51/0x60 [libcfs]
>>>>> Mar 29 20:58:43 cn507 kernel: [<ffffffff8844dbda>]
>>>>> lbug_with_loc+0x7a/0xd0 [libcfs]
>>>>> Mar 29 20:58:43 cn507 kernel: [<ffffffff8878d63f>]
>>>>> ll_readpage+0x129f/0x1e40 [lustre]
>>>>> Mar 29 20:58:43 cn507 kernel: [<ffffffff8000c707>]
>>>>> add_to_page_cache+0xaa/0xc1
>>>>> Mar 29 20:58:43 cn507 kernel: [<ffffffff8000c2f5>]
>>>>> do_generic_mapping_read+0x208/0x354
>>>>> Mar 29 20:58:43 cn507 kernel: [<ffffffff8000d0e0>]
>>>>> file_read_actor+0x0/0x159
>>>>> Mar 29 20:58:43 cn507 kernel: [<ffffffff8000c58d>]
>>>>> __generic_file_aio_read+0x14c/0x198
>>>>> Mar 29 20:58:43 cn507 kernel: [<ffffffff800c5d8f>]
>>>>> generic_file_readv+0x8f/0xa8
>>>>> Mar 29 20:58:43 cn507 kernel: [<ffffffff800a0307>]
>>>>> autoremove_wake_function+0x0/0x2e
>>>>> Mar 29 20:58:43 cn507 kernel: [<ffffffff8879a427>]
>>>>> our_vma+0x117/0x1d0 [lustre]
>>>>> Mar 29 20:58:43 cn507 kernel: [<ffffffff8000b984>]
>>>>> touch_atime+0x67/0xaa
>>>>> Mar 29 20:58:43 cn507 kernel: [<ffffffff8875f65b>]
>>>>> ll_file_readv+0x1e4b/0x2130 [lustre]
>>>>> Mar 29 20:58:43 cn507 kernel: [<ffffffff8875f95a>]
>>>>> ll_file_read+0x1a/0x20 [lustre]
>>>>> Mar 29 20:58:43 cn507 kernel: [<ffffffff8000b695>] vfs_read+0xcb/0x171
>>>>> Mar 29 20:58:43 cn507 kernel: [<ffffffff80011b60>] sys_read+0x45/0x6e
>>>>> Mar 29 20:58:43 cn507 kernel: [<ffffffff8006149d>]
>>>>> sysenter_do_call+0x1e/0x76
>>>>> Mar 29 20:58:43 cn507 kernel: Mar 29 20:58:43 cn507 kernel:
>>>>> LustreError: dumping log to /tmp/lustre-log.1269892723.18750
>>>>>
>>>> Thanks,
>>>>
>>>> Chris
>>>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
More information about the lustre-discuss
mailing list