[Lustre-discuss] o2iblnd no resources

Fri Feb 1 23:39:09 PST 2008

Hi Kilian,
I think it's because o2iblnd uses fragmented RDMA by default(Max to 
256), so we have to set max_send_wr as (concurrent_send * (256 + 1)) 
while creating QP by rdma_create_qp(), it takes a lot of resource and 
can make a busy server out of memory sometime.
To resolve this problem, we have to use FMR to map fragmented buffer to 
virtual contiguous I/O address, there will always be one fragment for 
RDMA by this way.
Here is patch for this problem (using FMR in o2iblnd)
https://bugzilla.lustre.org/attachment.cgi?id=15144

Regards
Liang

Kilian CAVALOTTI wrote:
> Hi all,
>
> What can cause a client to receive a "o2iblnd no resources" message 
> from an OSS?
> ---------------------------------------------------------------------------
> Feb  1 15:24:24 node-5-8 kernel: LustreError: 1893:0:(o2iblnd_cb.c:2448:kiblnd_rejected()) 10.10.60.3 at o2ib rejected: o2iblnd no resources
> ---------------------------------------------------------------------------
>
> I suspect an out-of-memory problem, and indeed the OSS logs are filled
> up with the following:
> ---------------------------------------------------------------------------
> ib_cm/3: page allocation failure. order:4, mode:0xd0
>
> Call Trace:<ffffffff8015c847>{__alloc_pages+777} <ffffffff801727e9>{alloc_page_interleave+61}
>        <ffffffff8015c8e0>{__get_free_pages+11} <ffffffff8015facd>{kmem_getpages+36}
>        <ffffffff80160262>{cache_alloc_refill+609} <ffffffff8015ff30>{__kmalloc+123}
>        <ffffffffa014ee75>{:ib_mthca:mthca_alloc_qp_common+668}
>        <ffffffffa014f42d>{:ib_mthca:mthca_alloc_qp+178} <ffffffffa0153e3a>{:ib_mthca:mthca_create_qp+311}
>        <ffffffffa00d5b1b>{:ib_core:ib_create_qp+20} <ffffffffa021a5f9>{:rdma_cm:rdma_create_qp+43}
>        <ffffffff8024b7b5>{dma_pool_free+245} <ffffffffa014b257>{:ib_mthca:mthca_init_cq+1073}
>        <ffffffffa01540cf>{:ib_mthca:mthca_create_cq+282} <ffffffff801727e9>{alloc_page_interleave+61}
>        <ffffffffa0400c10>{:ko2iblnd:kiblnd_cq_completion+0}
>        <ffffffffa0400d50>{:ko2iblnd:kiblnd_cq_event+0} <ffffffffa00d5cc1>{:ib_core:ib_create_cq+33}
>        <ffffffffa03f56bd>{:ko2iblnd:kiblnd_create_conn+3565}
>        <ffffffffa0276f38>{:libcfs:cfs_alloc+40} <ffffffffa03fe457>{:ko2iblnd:kiblnd_passive_connect+2215}
>        <ffffffffa00d8595>{:ib_core:ib_find_cached_gid+244}
>        <ffffffffa021a278>{:rdma_cm:cma_acquire_dev+293} <ffffffffa03ff540>{:ko2iblnd:kiblnd_cm_callback+64}
>        <ffffffffa03ff500>{:ko2iblnd:kiblnd_cm_callback+0}
>        <ffffffffa021b19a>{:rdma_cm:cma_req_handler+863} <ffffffff801e8427>{alloc_layer+67}
>        <ffffffff801e8645>{idr_get_new_above_int+423} <ffffffffa00fa0ab>{:ib_cm:cm_process_work+101}
>        <ffffffffa00faa57>{:ib_cm:cm_req_handler+2398} <ffffffffa00fae3c>{:ib_cm:cm_work_handler+0}
>        <ffffffffa00fae6a>{:ib_cm:cm_work_handler+46} <ffffffff80146fca>{worker_thread+419}
>        <ffffffff80133566>{default_wake_function+0} <ffffffff801335b7>{__wake_up_common+67}
>        <ffffffff80133566>{default_wake_function+0} <ffffffff8014ad18>{keventd_create_kthread+0}
>        <ffffffff80146e27>{worker_thread+0} <ffffffff8014ad18>{keventd_create_kthread+0}
>        <ffffffff8014acef>{kthread+200} <ffffffff80110de3>{child_rip+8}
>        <ffffffff8014ad18>{keventd_create_kthread+0} <ffffffff8014ac27>{kthread+0}
>        <ffffffff80110ddb>{child_rip+0}
> Mem-info:
> Node 0 DMA per-cpu:
> cpu 0 hot: low 2, high 6, batch 1
> cpu 0 cold: low 0, high 2, batch 1
> cpu 1 hot: low 2, high 6, batch 1
> cpu 1 cold: low 0, high 2, batch 1
> cpu 2 hot: low 2, high 6, batch 1
> cpu 2 cold: low 0, high 2, batch 1
> cpu 3 hot: low 2, high 6, batch 1
> cpu 3 cold: low 0, high 2, batch 1
> Node 0 Normal per-cpu:
> cpu 0 hot: low 32, high 96, batch 16
> cpu 0 cold: low 0, high 32, batch 16
> cpu 1 hot: low 32, high 96, batch 16
> cpu 1 cold: low 0, high 32, batch 16
> cpu 2 hot: low 32, high 96, batch 16
> cpu 2 cold: low 0, high 32, batch 16
> cpu 3 hot: low 32, high 96, batch 16
> cpu 3 cold: low 0, high 32, batch 16
> Node 0 HighMem per-cpu: empty
>
> Free pages:       35336kB (0kB HighMem)
> Active:534156 inactive:127091 dirty:1072 writeback:0 unstable:0 free:8834 slab:146612 mapped:26222 pagetables:1035
> Node 0 DMA free:9832kB min:52kB low:64kB high:76kB active:0kB inactive:0kB present:16384kB pages_scanned:37 all_unreclaimable? yes
> protections[]: 0 510200 510200
> Node 0 Normal free:25504kB min:16328kB low:20408kB high:24492kB active:2136624kB inactive:508364kB present:4964352kB pages_scanned:0 all_unreclaimable? no
> protections[]: 0 0 0
> Node 0 HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
> protections[]: 0 0 0
> Node 0 DMA: 2*4kB 2*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 2*4096kB = 9832kB
> Node 0 Normal: 1284*4kB 2290*8kB 126*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 25504kB
> Node 0 HighMem: empty
> Swap cache: add 111, delete 111, find 23/36, race 0+0
> Free swap:       4096360kB
> 1245184 pages of RAM
> 235840 reserved pages
> 659867 pages shared
> 0 pages swap cached
> ---------------------------------------------------------------------------
>
> IB links are up and working on both the client and the OSS:
> ---------------------------------------------------------------------------
> client# ibstatus
> Infiniband device 'mthca0' port 1 status:
>         default gid:     fe80:0000:0000:0000:0005:ad00:0008:af71
>         base lid:        0x83
>         sm lid:          0x130
>         state:           4: ACTIVE
>         phys state:      5: LinkUp
>         rate:            20 Gb/sec (4X DDR)
> oss# ibstatus
> Infiniband device 'mthca0' port 1 status:
>         default gid:     fe80:0000:0000:0000:0005:ad00:0008:cb11
>         base lid:        0x126
>         sm lid:          0x130
>         state:           4: ACTIVE
>         phys state:      5: LinkUp
>         rate:            20 Gb/sec (4X DDR)
> ---------------------------------------------------------------------------
> And the Subnet Manager doesn't expose any unusual error or skyrocketed 
> counter (I use OFED 1.2, kernel 2.6.9-55.0.9.EL_lustre.1.6.4.1smp).
>
> What I don't really get is that most clients can access files on this
> OSS with no issue, and besides, my limited understanding of the kernel
> memory mechanisms tend to let me believe that this OSS is not out of 
> memory:
> ---------------------------------------------------------------------------
> # cat /proc/meminfo
> MemTotal:      4037380 kB
> MemFree:         31688 kB
> Buffers:       1333536 kB
> Cached:        1231900 kB
> SwapCached:          0 kB
> Active:        2138948 kB
> Inactive:       507720 kB
> HighTotal:           0 kB
> HighFree:            0 kB
> LowTotal:      4037380 kB
> LowFree:         31688 kB
> SwapTotal:     4096564 kB
> SwapFree:      4096360 kB
> Dirty:            6868 kB
> Writeback:           0 kB
> Mapped:         106984 kB
> Slab:           588200 kB
> CommitLimit:   6115252 kB
> Committed_AS:   860508 kB
> PageTables:       4304 kB
> VmallocTotal: 536870911 kB
> VmallocUsed:    274788 kB
> VmallocChunk: 536596091 kB
> HugePages_Total:     0
> HugePages_Free:      0
> Hugepagesize:     2048 kB
> ---------------------------------------------------------------------------
>
> This only appeared lately, after several week of continuous use of the 
> filesystem, without any problem. Is there anything like a memory leak 
> somewhere? Any help to diagnose the problem would be greatly appreciated.
>
> Thanks!
>