[lustre-discuss] FLR mirroring on 2.12.1-1

Fri May 24 16:26:01 PDT 2019

Hi John,

Thanks for the log. As I looked into the log, I think it has been broken by
the commit:

```
commit 5a6ceb664f07812c351786c1043da71ff5027f8c
Author: Alex Zhuravlev <alexey.zhuravlev at intel.com>
Date:   Mon Sep 28 16:50:15 2015 +0300

    LU-7236 ptlrpc: idle connections can disconnect
```

In particular, this following change introduced the problem:
```
-               } else if (req->rq_no_delay) {
+               } else if (req->rq_no_delay &&
+                          imp->imp_generation != imp->imp_initiated_at) {
+                       /* ignore nodelay for requests initiating
connections */
                         *status = -EWOULDBLOCK;
```

where it makes the RPC request to be delayed even `rq_no_delay` is set.

Jinshan

On Fri, May 24, 2019 at 6:29 AM John Doe <ace629426 at gmail.com> wrote:

> I have sent the log file to you in a separate email.
>
> Note - I read four 1MB blocks, the first two 1MB blocks were cached.
>
> On Fri, May 24, 2019 at 1:01 AM Jinshan Xiong <jinshan.xiong at gmail.com>
> wrote:
>
>> hmm.. This definitely is not expected. As long as ost 1 is down, it
>> should be returned immediately from OSC layer and tries to read the 2nd
>> mirror that is located on ost 7. For the following blocks, it should not
>> even try ost1 but go to 7 directly.
>>
>> Would you please collect Lustre log and send it to me? You can collect
>> logs on client side as follows:
>> 0. create mirrored file
>> 1. lctl set_param debug=-1 && lctl clear
>> 2. lctl mark "======= start ========"
>> 3. read the file
>> 4. lctl dk > log.txt
>>
>> and send me the log.txt file. If you can reproduce this problem
>> consistently, please use a small file so that it would be easier to check
>> the log.
>>
>> Jinshan
>>
>> On Mon, May 20, 2019 at 6:20 AM John Doe <ace629426 at gmail.com> wrote:
>>
>>> It turns out that the read eventually finished and was 1/10th of the
>>> performance that I was expecting.
>>>
>>> As ost idx 1 is unavailable, the client read has to timeout on ost idx 1
>>> and then will read from ost idx 7. This happens for each 1MB block, as I am
>>> using that as the block size.
>>>
>>> Is there a tunable to avoid this issue?
>>>
>>> lfs check osts also takes about 30 seconds as it times out on the
>>> unavailable OST.
>>>
>>> Due to this issue, I am virtually unable to use the mirroring feature.
>>>
>>> I
>>>
>>> On Sun, May 19, 2019 at 4:27 PM John Doe <ace629426 at gmail.com> wrote:
>>>
>>>> After mirroring a file , when one mirror is down, any reads from a
>>>> client just hangs. Both server and client are running latest 2.12.1-1.
>>>> Client waits for ost idx 1 to come back online.  I am only unmounting ost
>>>> idx1 not ost idx 7.
>>>>
>>>> Has anyone tried this feature?
>>>>
>>>> Thanks,
>>>> John.
>>>>
>>>> lfs getstripe mirror10
>>>> mirror10
>>>>   lcm_layout_gen:    5
>>>>   lcm_mirror_count:  2
>>>>   lcm_entry_count:   2
>>>>     lcme_id:             65537
>>>>     lcme_mirror_id:      1
>>>>     lcme_flags:          init
>>>>     lcme_extent.e_start: 0
>>>>     lcme_extent.e_end:   EOF
>>>>       lmm_stripe_count:  1
>>>>       lmm_stripe_size:   1048576
>>>>       lmm_pattern:       raid0
>>>>       lmm_layout_gen:    0
>>>>       lmm_stripe_offset: 1
>>>>       lmm_pool:          01
>>>>       lmm_objects:
>>>>       - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x280a8:0x0] }
>>>>
>>>>     lcme_id:             131074
>>>>     lcme_mirror_id:      2
>>>>     lcme_flags:          init
>>>>     lcme_extent.e_start: 0
>>>>     lcme_extent.e_end:   EOF
>>>>       lmm_stripe_count:  1
>>>>       lmm_stripe_size:   1048576
>>>>       lmm_pattern:       raid0
>>>>       lmm_layout_gen:    0
>>>>       lmm_stripe_offset: 7
>>>>       lmm_pool:          02
>>>>       lmm_objects:
>>>>       - 0: { l_ost_idx: 7, l_fid: [0x100070000:0x28066:0x0] }
>>>>
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190524/2234a54d/attachment.html>