[lustre-discuss] FLR mirroring on 2.12.1-1
Jinshan Xiong
jinshan.xiong at gmail.com
Fri May 24 16:26:01 PDT 2019
Hi John,
Thanks for the log. As I looked into the log, I think it has been broken by
the commit:
```
commit 5a6ceb664f07812c351786c1043da71ff5027f8c
Author: Alex Zhuravlev <alexey.zhuravlev at intel.com>
Date: Mon Sep 28 16:50:15 2015 +0300
LU-7236 ptlrpc: idle connections can disconnect
```
In particular, this following change introduced the problem:
```
- } else if (req->rq_no_delay) {
+ } else if (req->rq_no_delay &&
+ imp->imp_generation != imp->imp_initiated_at) {
+ /* ignore nodelay for requests initiating
connections */
*status = -EWOULDBLOCK;
```
where it makes the RPC request to be delayed even `rq_no_delay` is set.
Jinshan
On Fri, May 24, 2019 at 6:29 AM John Doe <ace629426 at gmail.com> wrote:
> I have sent the log file to you in a separate email.
>
> Note - I read four 1MB blocks, the first two 1MB blocks were cached.
>
> On Fri, May 24, 2019 at 1:01 AM Jinshan Xiong <jinshan.xiong at gmail.com>
> wrote:
>
>> hmm.. This definitely is not expected. As long as ost 1 is down, it
>> should be returned immediately from OSC layer and tries to read the 2nd
>> mirror that is located on ost 7. For the following blocks, it should not
>> even try ost1 but go to 7 directly.
>>
>> Would you please collect Lustre log and send it to me? You can collect
>> logs on client side as follows:
>> 0. create mirrored file
>> 1. lctl set_param debug=-1 && lctl clear
>> 2. lctl mark "======= start ========"
>> 3. read the file
>> 4. lctl dk > log.txt
>>
>> and send me the log.txt file. If you can reproduce this problem
>> consistently, please use a small file so that it would be easier to check
>> the log.
>>
>> Jinshan
>>
>> On Mon, May 20, 2019 at 6:20 AM John Doe <ace629426 at gmail.com> wrote:
>>
>>> It turns out that the read eventually finished and was 1/10th of the
>>> performance that I was expecting.
>>>
>>> As ost idx 1 is unavailable, the client read has to timeout on ost idx 1
>>> and then will read from ost idx 7. This happens for each 1MB block, as I am
>>> using that as the block size.
>>>
>>> Is there a tunable to avoid this issue?
>>>
>>> lfs check osts also takes about 30 seconds as it times out on the
>>> unavailable OST.
>>>
>>> Due to this issue, I am virtually unable to use the mirroring feature.
>>>
>>> I
>>>
>>> On Sun, May 19, 2019 at 4:27 PM John Doe <ace629426 at gmail.com> wrote:
>>>
>>>> After mirroring a file , when one mirror is down, any reads from a
>>>> client just hangs. Both server and client are running latest 2.12.1-1.
>>>> Client waits for ost idx 1 to come back online. I am only unmounting ost
>>>> idx1 not ost idx 7.
>>>>
>>>> Has anyone tried this feature?
>>>>
>>>> Thanks,
>>>> John.
>>>>
>>>> lfs getstripe mirror10
>>>> mirror10
>>>> lcm_layout_gen: 5
>>>> lcm_mirror_count: 2
>>>> lcm_entry_count: 2
>>>> lcme_id: 65537
>>>> lcme_mirror_id: 1
>>>> lcme_flags: init
>>>> lcme_extent.e_start: 0
>>>> lcme_extent.e_end: EOF
>>>> lmm_stripe_count: 1
>>>> lmm_stripe_size: 1048576
>>>> lmm_pattern: raid0
>>>> lmm_layout_gen: 0
>>>> lmm_stripe_offset: 1
>>>> lmm_pool: 01
>>>> lmm_objects:
>>>> - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x280a8:0x0] }
>>>>
>>>> lcme_id: 131074
>>>> lcme_mirror_id: 2
>>>> lcme_flags: init
>>>> lcme_extent.e_start: 0
>>>> lcme_extent.e_end: EOF
>>>> lmm_stripe_count: 1
>>>> lmm_stripe_size: 1048576
>>>> lmm_pattern: raid0
>>>> lmm_layout_gen: 0
>>>> lmm_stripe_offset: 7
>>>> lmm_pool: 02
>>>> lmm_objects:
>>>> - 0: { l_ost_idx: 7, l_fid: [0x100070000:0x28066:0x0] }
>>>>
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190524/2234a54d/attachment.html>
More information about the lustre-discuss
mailing list