[lustre-devel] lustre:pcc: Sanity-pcc 7a test hang(Both on Aarch64 and X86_64) discussion

Andreas Dilger adilger at whamcloud.com
Fri Mar 11 16:01:57 PST 2022

Qian  has a patch https://review.whamcloud.com/40092 "LU-14003<https://jira.whamcloud.com/browse/LU-14003> pcc: rework PCC mmap implementation" that is changing the PCC MMAP code significantly, but is waiting for the 2.16.0 feature landing window to open.  It needs to be refreshed, but it would be helpful if you could take a look through that patch to see if it would resolve the issue you are seeing.

On Mar 11, 2022, at 01:18, Kevin Zhao via lustre-devel <lustre-devel at lists.lustre.org<mailto:lustre-devel at lists.lustre.org>> wrote:


Recently we've worked on the bug https://jira.whamcloud.com/browse/LU-14346. This bug will make the mmap write hang forever. This one is first occurring on Aarch64, but if we do a small change<https://github.com/kevinzs2048/devbox/blob/master/notes/lustre/pcc_sanity-pcc-7a-analysis.md#reproduced-on-x86_64-with-a-small-change>, we can easily reproduce it on X86_64. For more details analysis of this bug, you can also check the link<https://github.com/kevinzs2048/devbox/blob/master/notes/lustre/pcc_sanity-pcc-7a-analysis.md>.

The hang location is here<https://github.com/lustre/lustre-release/blob/master/lustre/tests/multiop.c#L725> as below:
    case 'W':
        for (i = 0; i < mmap_len && mmap_ptr; i += 4096)
            mmap_ptr[i] += junk++;

Bug Analysis - different behavior when run mmap_ptr[i] += junk++ on different platform.
Traditionally, this process is:
1. read from mmap_ptr[i]first(Execute the read page fault)
2. Write a value to the same page(execute the page_mkwrite to change the page to writable).

But on different platforms, it executes quite differently.
On aarch64 platform: do_page_fault, no FAULT_FLAG_WRITE set, so handle_pte_fault will call do_read_fault

  *   do_read_fault:
            __do_fault -> call ll_fault, get a page from pcc_fault
            finish_fault(map the returned page to page tables)
            vmf->flags is VM_FAULT_LOCKED
  *   call do_wp_page --> do_page_mkwrite --> ll_page_mkwrite

On X86_64 platform, the mechanism is different. On X86_64, do_page_fault, with FAULT_FLAG_WRITE set, so handle_pte_fault will call do_shared_fault.

  *   do_shared_fault
     *    __do_fault -> call ll_fault, get a page from pcc_fault
     *   do_page_mkwrite-> call ll_page_mkwrite
     *   finish_fault(map the returned page to page tables)
     *   fault_dirty_shared_page

Bug Analysis: why hang forever:
Also can check: https://github.com/kevinzs2048/devbox/blob/master/notes/lustre/pcc_sanity-pcc-7a-analysis.md#kernel-do_page_fault-process-analysis for more details.

    Insert the issue 0x1412 OBD_FAIL_LLITE_PCC_DETACH_MKWRITE.
    RETRY again, due to PTE is not NULL, vmf->flags FAULT_FLAG_WRITE, will call do_wp_page again.
So that next time we will enter into do_page_mkwrite again. hanging forever.

Seek a good solution
As the above code snippet shows, we want to let the kernel retry the mmap write (->fault() and ->page_mkwrite).
In handle_pte_fault, if there is no page or the page is not mapped(no PTE found), then
 __do_page_fault will try the memory fault handling.

The easy fix here is to remove the page and page table entry when we do fail injection in pcc_page_mkwrite. But I don't find a good method to execute this, so list the info here and ask for community help.

Some tried fix is:
add function: generic_error_remove_page, but the mapped page still can not be unmapped successfully. The error log is here<https://github.com/kevinzs2048/devbox/blob/master/notes/lustre/pcc_sanity-pcc-7a-analysis.md#solution>.

Since I'm a newbie to Lustre and not quite familiar with the memory management process, so please give some advice on this bug fix. Thanks in advance.

Cheers, Andreas
Andreas Dilger
Lustre Principal Architect

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20220312/b4ce4e89/attachment.html>

More information about the lustre-devel mailing list