[Lustre-discuss] Lustre Client - Memory Issue

Jagga Soorma jagga13 at gmail.com
Mon Apr 19 14:04:44 PDT 2010


Also, here is the /proc/sys/lustre/memused for all my compute nodes:

--
580093506
138839275
26861811
1192253563
137585179
138928251
2214512411
1872972173
138846915
142122291
131465011
139968027
4626653800
188594515
23944887
2495517689
--

Can someone help me put some of this information together and make sense of
this?  I am pretty sure this is related to lustre but not sure what might be
eating up the memory.

Thanks,
-J

On Mon, Apr 19, 2010 at 1:28 PM, Jagga Soorma <jagga13 at gmail.com> wrote:

> I have tried:
>
> echo 1 > /proc/sys/vm/drop_caches
> &
> echo 3 > /proc/sys/vm/drop_caches
>
> However the free memory does not change at all.  Any ideas what might be
> going on?
>
> -Simran
>
>
> On Mon, Apr 19, 2010 at 11:37 AM, Jagga Soorma <jagga13 at gmail.com> wrote:
>
>> Could it be locking?  I do have the flock option enabled.
>>
>> --
>> lustre_inode_cache    123    192    896    4    1 : tunables   54   27
>> 8 : slabdata     48     48      0
>> lov_oinfo            128    228    320   12    1 : tunables   54   27    8
>> : slabdata     19     19      0
>> ldlm_locks          1550   3992    512    8    1 : tunables   54   27    8
>> : slabdata    499    499      0
>> ldlm_resources      1449   3600    384   10    1 : tunables   54   27    8
>> : slabdata    360    360      0
>> --
>>
>> Thanks,
>> -J
>>
>>
>> On Mon, Apr 19, 2010 at 11:26 AM, Jagga Soorma <jagga13 at gmail.com> wrote:
>>
>>> Here is something from April 12 that I see in the client logs.  Not sure
>>> if this is related:
>>>
>>> --
>>> Apr 12 14:51:16 manak kernel: Lustre: 7359:0:(rw.c:2092:ll_readpage())
>>> ino 424411146 page 0 (0) not covered by a lock (mmap?).  check debug logs.
>>> Apr 12 14:51:16 manak kernel: Lustre: 7359:0:(rw.c:2092:ll_readpage())
>>> ino 424411146 page 1480 (6062080) not covered by a lock (mmap?).  check
>>> debug logs.
>>> Apr 12 14:51:16 manak kernel: Lustre: 7359:0:(rw.c:2092:ll_readpage())
>>> Skipped 1479 previous similar messages
>>> Apr 12 14:51:17 manak kernel: Lustre: 7359:0:(rw.c:2092:ll_readpage())
>>> ino 424411146 page 273025 (1118310400) not covered by a lock (mmap?).  check
>>> debug logs.
>>> Apr 12 14:51:17 manak kernel: Lustre: 7359:0:(rw.c:2092:ll_readpage())
>>> Skipped 271544 previous similar messages
>>> --
>>>
>>> -J
>>>
>>>
>>> On Mon, Apr 19, 2010 at 11:02 AM, Jagga Soorma <jagga13 at gmail.com>wrote:
>>>
>>>> Andreas,
>>>>
>>>> I am seeing the problem again on one of my hosts and here is a live
>>>> capture of the data.  Can you assist with this?
>>>>
>>>> --
>>>> # free
>>>>              total       used       free     shared    buffers
>>>> cached
>>>> Mem:     198091444  197636852     454592          0       4260
>>>> 34251452
>>>> -/+ buffers/cache:  163381140   34710304
>>>> Swap:     75505460   10281796   65223664
>>>>
>>>> # cat /proc/meminfo
>>>> MemTotal:     198091444 kB
>>>> MemFree:        458048 kB
>>>> Buffers:          4268 kB
>>>> Cached:       34099372 kB
>>>> SwapCached:    7730744 kB
>>>> Active:       62919152 kB
>>>> Inactive:     34107188 kB
>>>> SwapTotal:    75505460 kB
>>>> SwapFree:     65220676 kB
>>>> Dirty:             444 kB
>>>> Writeback:           0 kB
>>>> AnonPages:    58704728 kB
>>>> Mapped:          12036 kB
>>>> Slab:         99806476 kB
>>>> SReclaimable:   118532 kB
>>>> SUnreclaim:   99687944 kB
>>>> PageTables:     131200 kB
>>>>
>>>> NFS_Unstable:        0 kB
>>>> Bounce:              0 kB
>>>> WritebackTmp:        0 kB
>>>> CommitLimit:  174551180 kB
>>>> Committed_AS: 65739660 kB
>>>>
>>>> VmallocTotal: 34359738367 kB
>>>> VmallocUsed:    588416 kB
>>>> VmallocChunk: 34359149923 kB
>>>> HugePages_Total:     0
>>>> HugePages_Free:      0
>>>> HugePages_Rsvd:      0
>>>> HugePages_Surp:      0
>>>> Hugepagesize:     2048 kB
>>>> DirectMap4k:      8432 kB
>>>> DirectMap2M:  201308160 kB
>>>>
>>>> # cat /proc/slabinfo
>>>>  slabinfo - version: 2.1
>>>> # name            <active_objs> <num_objs> <objsize> <objperslab>
>>>> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata
>>>> <active_slabs> <num_slabs> <sharedavail>
>>>> nfs_direct_cache       0      0    128   30    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> nfs_write_data        36     44    704   11    2 : tunables   54   27
>>>> 8 : slabdata      4      4      0
>>>> nfs_read_data         32     33    704   11    2 : tunables   54   27
>>>> 8 : slabdata      3      3      0
>>>> nfs_inode_cache        0      0    984    4    1 : tunables   54   27
>>>> 8 : slabdata      0      0      0
>>>> nfs_page               0      0    128   30    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> rpc_buffers            8      8   2048    2    1 : tunables   24   12
>>>> 8 : slabdata      4      4      0
>>>> rpc_tasks              8     12    320   12    1 : tunables   54   27
>>>> 8 : slabdata      1      1      0
>>>> rpc_inode_cache        0      0    832    4    1 : tunables   54   27
>>>> 8 : slabdata      0      0      0
>>>> ll_async_page     8494811 8507076    320   12    1 : tunables   54
>>>> 27    8 : slabdata 708923 708923    216
>>>> ll_file_data          16     40    192   20    1 : tunables  120   60
>>>> 8 : slabdata      2      2      0
>>>> lustre_inode_cache     95    184    896    4    1 : tunables   54
>>>> 27    8 : slabdata     46     46      0
>>>> lov_oinfo             56    180    320   12    1 : tunables   54   27
>>>> 8 : slabdata     15     15      0
>>>>
>>>> osc_quota_info         0      0     32  112    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> ll_qunit_cache         0      0    112   34    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> llcd_cache             0      0   3952    1    1 : tunables   24   12
>>>> 8 : slabdata      0      0      0
>>>> ptlrpc_cbdatas         0      0     32  112    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> interval_node       1680   5730    128   30    1 : tunables  120   60
>>>> 8 : slabdata    191    191      0
>>>> ldlm_locks          2255   6232    512    8    1 : tunables   54   27
>>>> 8 : slabdata    779    779      0
>>>> ldlm_resources      2227   5570    384   10    1 : tunables   54   27
>>>> 8 : slabdata    557    557      0
>>>>
>>>> ll_import_cache        0      0   1248    3    1 : tunables   24   12
>>>> 8 : slabdata      0      0      0
>>>> ll_obdo_cache          0 459630919    208   19    1 : tunables  120
>>>> 60    8 : slabdata      0 24191101      0
>>>>
>>>> ll_obd_dev_cache      13     13   5672    1    2 : tunables    8    4
>>>> 0 : slabdata     13     13      0
>>>> obd_lvfs_ctxt_cache      0      0     96   40    1 : tunables  120
>>>> 60    8 : slabdata      0      0      0
>>>> SDP                    0      0   1728    4    2 : tunables   24   12
>>>> 8 : slabdata      0      0      0
>>>>  fib6_nodes             7     59     64   59    1 : tunables  120
>>>> 60    8 : slabdata      1      1      0
>>>> ip6_dst_cache         10     24    320   12    1 : tunables   54   27
>>>> 8 : slabdata      2      2      0
>>>>
>>>> ndisc_cache            3     30    256   15    1 : tunables  120   60
>>>> 8 : slabdata      2      2      0
>>>> RAWv6                 35     36    960    4    1 : tunables   54   27
>>>> 8 : slabdata      9      9      0
>>>> UDPLITEv6              0      0    960    4    1 : tunables   54   27
>>>> 8 : slabdata      0      0      0
>>>> UDPv6                  7     12    960    4    1 : tunables   54   27
>>>> 8 : slabdata      3      3      0
>>>> tw_sock_TCPv6          0      0    192   20    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> request_sock_TCPv6      0      0    192   20    1 : tunables  120
>>>> 60    8 : slabdata      0      0      0
>>>> TCPv6                  3      4   1792    2    1 : tunables   24   12
>>>> 8 : slabdata      2      2      0
>>>> ib_mad              2051   2096    448    8    1 : tunables   54   27
>>>> 8 : slabdata    262    262      0
>>>>
>>>> fuse_request           0      0    608    6    1 : tunables   54   27
>>>> 8 : slabdata      0      0      0
>>>> fuse_inode             0      0    704   11    2 : tunables   54   27
>>>> 8 : slabdata      0      0      0
>>>> kcopyd_job             0      0    360   11    1 : tunables   54   27
>>>> 8 : slabdata      0      0      0
>>>> dm_uevent              0      0   2608    3    2 : tunables   24   12
>>>> 8 : slabdata      0      0      0
>>>> dm_clone_bio_info      0      0     16  202    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> dm_rq_target_io        0      0    408    9    1 : tunables   54   27
>>>> 8 : slabdata      0      0      0
>>>> dm_target_io           0      0     24  144    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> dm_io                  0      0     32  112    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> uhci_urb_priv          1     67     56   67    1 : tunables  120   60
>>>> 8 : slabdata      1      1      0
>>>> ext3_inode_cache    2472   2610    768    5    1 : tunables   54   27
>>>> 8 : slabdata    522    522      0
>>>>
>>>> ext3_xattr             0      0     88   44    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> journal_handle        56    288     24  144    1 : tunables  120   60
>>>> 8 : slabdata      2      2      0
>>>> journal_head         216    240     96   40    1 : tunables  120   60
>>>> 8 : slabdata      6      6      0
>>>>
>>>> revoke_table           4    202     16  202    1 : tunables  120   60
>>>> 8 : slabdata      1      1      0
>>>> revoke_record          0      0     32  112    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> sgpool-128             2      2   4096    1    1 : tunables   24   12
>>>> 8 : slabdata      2      2      0
>>>> sgpool-64              2      2   2048    2    1 : tunables   24   12
>>>> 8 : slabdata      1      1      0
>>>> sgpool-32              2      4   1024    4    1 : tunables   54   27
>>>> 8 : slabdata      1      1      0
>>>> sgpool-16              2      8    512    8    1 : tunables   54   27
>>>> 8 : slabdata      1      1      0
>>>> sgpool-8               2     15    256   15    1 : tunables  120   60
>>>> 8 : slabdata      1      1      0
>>>> scsi_data_buffer       0      0     24  144    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> scsi_io_context        0      0    112   34    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> flow_cache             0      0     96   40    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> cfq_io_context        58    207    168   23    1 : tunables  120   60
>>>> 8 : slabdata      9      9      0
>>>> cfq_queue             56    308    136   28    1 : tunables  120   60
>>>> 8 : slabdata     11     11      0
>>>>
>>>> bsg_cmd                0      0    312   12    1 : tunables   54   27
>>>> 8 : slabdata      0      0      0
>>>> mqueue_inode_cache      1      4    896    4    1 : tunables   54
>>>> 27    8 : slabdata      1      1      0
>>>> isofs_inode_cache      0      0    608    6    1 : tunables   54   27
>>>> 8 : slabdata      0      0      0
>>>> minix_inode_cache      0      0    624    6    1 : tunables   54   27
>>>> 8 : slabdata      0      0      0
>>>> hugetlbfs_inode_cache      1      7    576    7    1 : tunables   54
>>>> 27    8 : slabdata      1      1      0
>>>> dnotify_cache          0      0     40   92    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> dquot                  0      0    256   15    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> inotify_event_cache      0      0     40   92    1 : tunables  120
>>>> 60    8 : slabdata      0      0      0
>>>> inotify_watch_cache     94    159     72   53    1 : tunables  120
>>>> 60    8 : slabdata      3      3      0
>>>>
>>>> kioctx                 0      0    384   10    1 : tunables   54   27
>>>> 8 : slabdata      0      0      0
>>>> kiocb                  0      0    256   15    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> fasync_cache           0      0     24  144    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> shmem_inode_cache    878   1040    784    5    1 : tunables   54   27
>>>> 8 : slabdata    208    208      0
>>>>
>>>> pid_namespace          0      0   2112    3    2 : tunables   24   12
>>>> 8 : slabdata      0      0      0
>>>> nsproxy                0      0     56   67    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> posix_timers_cache      0      0    192   20    1 : tunables  120
>>>> 60    8 : slabdata      0      0      0
>>>> uid_cache              7     60    128   30    1 : tunables  120   60
>>>> 8 : slabdata      2      2      0
>>>> UNIX                 128    220    704   11    2 : tunables   54   27
>>>> 8 : slabdata     20     20      0
>>>>
>>>> ip_mrt_cache           0      0    128   30    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> UDP-Lite               0      0    832    9    2 : tunables   54   27
>>>> 8 : slabdata      0      0      0
>>>> tcp_bind_bucket       15    118     64   59    1 : tunables  120   60
>>>> 8 : slabdata      2      2      0
>>>>
>>>> inet_peer_cache        1     59     64   59    1 : tunables  120   60
>>>> 8 : slabdata      1      1      0
>>>> secpath_cache          0      0     64   59    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> xfrm_dst_cache         0      0    384   10    1 : tunables   54   27
>>>> 8 : slabdata      0      0      0
>>>> ip_fib_alias           0      0     32  112    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> ip_fib_hash           15    106     72   53    1 : tunables  120   60
>>>> 8 : slabdata      2      2      0
>>>> ip_dst_cache          40     84    320   12    1 : tunables   54   27
>>>> 8 : slabdata      7      7      0
>>>>
>>>> arp_cache              8     15    256   15    1 : tunables  120   60
>>>> 8 : slabdata      1      1      0
>>>> RAW                   33     35    768    5    1 : tunables   54   27
>>>> 8 : slabdata      7      7      0
>>>> UDP                   11     36    832    9    2 : tunables   54   27
>>>> 8 : slabdata      4      4      0
>>>> tw_sock_TCP            4     20    192   20    1 : tunables  120   60
>>>> 8 : slabdata      1      1      0
>>>>
>>>> request_sock_TCP       0      0    128   30    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> TCP                   16     24   1664    4    2 : tunables   24   12
>>>> 8 : slabdata      6      6      0
>>>> eventpoll_pwq         69    159     72   53    1 : tunables  120   60
>>>> 8 : slabdata      3      3      0
>>>> eventpoll_epi         69    150    128   30    1 : tunables  120   60
>>>> 8 : slabdata      5      5      0
>>>>
>>>> pfm_event_set          0      0  57344    1   16 : tunables    8    4
>>>> 0 : slabdata      0      0      0
>>>> pfm_context            0      0   8192    1    2 : tunables    8    4
>>>> 0 : slabdata      0      0      0
>>>> blkdev_integrity       0      0    112   34    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> blkdev_queue          10     12   2264    3    2 : tunables   24   12
>>>> 8 : slabdata      4      4      0
>>>> blkdev_requests       91    130    368   10    1 : tunables   54   27
>>>> 8 : slabdata     13     13     27
>>>> blkdev_ioc            56    371     72   53    1 : tunables  120   60
>>>> 8 : slabdata      7      7      0
>>>>
>>>> biovec-256             2      2   4096    1    1 : tunables   24   12
>>>> 8 : slabdata      2      2      0
>>>> biovec-128             2      4   2048    2    1 : tunables   24   12
>>>> 8 : slabdata      2      2      0
>>>> biovec-64              2      8   1024    4    1 : tunables   54   27
>>>> 8 : slabdata      2      2      0
>>>> biovec-16              2     30    256   15    1 : tunables  120   60
>>>> 8 : slabdata      2      2      0
>>>> biovec-4               2    118     64   59    1 : tunables  120   60
>>>> 8 : slabdata      2      2      0
>>>> biovec-1             223    606     16  202    1 : tunables  120   60
>>>> 8 : slabdata      3      3     70
>>>>
>>>> bio_integrity_payload      2     60    128   30    1 : tunables  120
>>>> 60    8 : slabdata      2      2      0
>>>>  bio                  205    330    128   30    1 : tunables  120
>>>> 60    8 : slabdata     11     11     70
>>>> sock_inode_cache     245    300    640    6    1 : tunables   54   27
>>>> 8 : slabdata     50     50      0
>>>> skbuff_fclone_cache     14     14    512    7    1 : tunables   54
>>>> 27    8 : slabdata      2      2      0
>>>> skbuff_head_cache   5121   5985    256   15    1 : tunables  120   60
>>>> 8 : slabdata    399    399     68
>>>> file_lock_cache        4     22    176   22    1 : tunables  120   60
>>>> 8 : slabdata      1      1      0
>>>> Acpi-Operand         889   1749     72   53    1 : tunables  120   60
>>>> 8 : slabdata     33     33      0
>>>>
>>>> Acpi-ParseExt          0      0     72   53    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> Acpi-Parse             0      0     48   77    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> Acpi-State             0      0     80   48    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> Acpi-Namespace       617    672     32  112    1 : tunables  120   60
>>>> 8 : slabdata      6      6      0
>>>> task_delay_info      389    884    112   34    1 : tunables  120   60
>>>> 8 : slabdata     26     26      0
>>>>
>>>> taskstats              0      0    328   12    1 : tunables   54   27
>>>> 8 : slabdata      0      0      0
>>>> page_cgroup            0      0     40   92    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> proc_inode_cache    1397   1446    608    6    1 : tunables   54   27
>>>> 8 : slabdata    240    241    190
>>>> sigqueue              29     96    160   24    1 : tunables  120   60
>>>> 8 : slabdata      4      4      1
>>>> radix_tree_node   193120 196672    552    7    1 : tunables   54   27
>>>> 8 : slabdata  28096  28096    216
>>>> bdev_cache             5     15    768    5    1 : tunables   54   27
>>>> 8 : slabdata      3      3      0
>>>>
>>>> sysfs_dir_cache    19120  19296     80   48    1 : tunables  120   60
>>>> 8 : slabdata    402    402      0
>>>> mnt_cache             30    105    256   15    1 : tunables  120   60
>>>> 8 : slabdata      7      7      0
>>>> inode_cache         1128   1176    560    7    1 : tunables   54   27
>>>> 8 : slabdata    166    168     24
>>>> dentry              4651   8189    208   19    1 : tunables  120   60
>>>> 8 : slabdata    431    431      0
>>>> filp                1563   2720    192   20    1 : tunables  120   60
>>>> 8 : slabdata    136    136    242
>>>> names_cache          142    142   4096    1    1 : tunables   24   12
>>>> 8 : slabdata    142    142     96
>>>>
>>>> key_jar                0      0    192   20    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> buffer_head         1129   3071    104   37    1 : tunables  120   60
>>>> 8 : slabdata     83     83      0
>>>> mm_struct             86    136    896    4    1 : tunables   54   27
>>>> 8 : slabdata     34     34      1
>>>> vm_area_struct      3406   4136    176   22    1 : tunables  120   60
>>>> 8 : slabdata    188    188     26
>>>> fs_cache             140    531     64   59    1 : tunables  120   60
>>>> 8 : slabdata      9      9      1
>>>> files_cache           83    150    768    5    1 : tunables   54   27
>>>> 8 : slabdata     30     30      1
>>>> signal_cache         325    388    960    4    1 : tunables   54   27
>>>> 8 : slabdata     97     97      0
>>>> sighand_cache        317    369   2112    3    2 : tunables   24   12
>>>> 8 : slabdata    123    123      0
>>>> task_xstate          155    256    512    8    1 : tunables   54   27
>>>> 8 : slabdata     32     32      2
>>>> task_struct          368    372   5872    1    2 : tunables    8    4
>>>> 0 : slabdata    368    372      0
>>>> anon_vma             966   1728     24  144    1 : tunables  120   60
>>>> 8 : slabdata     12     12      0
>>>> pid                  377    960    128   30    1 : tunables  120   60
>>>> 8 : slabdata     32     32      0
>>>>
>>>> shared_policy_node      0      0     48   77    1 : tunables  120
>>>> 60    8 : slabdata      0      0      0
>>>> numa_policy           15    112    136   28    1 : tunables  120   60
>>>> 8 : slabdata      4      4      0
>>>> idr_layer_cache      284    322    544    7    1 : tunables   54   27
>>>> 8 : slabdata     46     46      0
>>>>
>>>> size-4194304(DMA)      0      0 4194304    1 1024 : tunables    1
>>>> 1    0 : slabdata      0      0      0
>>>> size-4194304           0      0 4194304    1 1024 : tunables    1
>>>> 1    0 : slabdata      0      0      0
>>>> size-2097152(DMA)      0      0 2097152    1  512 : tunables    1
>>>> 1    0 : slabdata      0      0      0
>>>> size-2097152           0      0 2097152    1  512 : tunables    1
>>>> 1    0 : slabdata      0      0      0
>>>> size-1048576(DMA)      0      0 1048576    1  256 : tunables    1
>>>> 1    0 : slabdata      0      0      0
>>>> size-1048576           0      0 1048576    1  256 : tunables    1
>>>> 1    0 : slabdata      0      0      0
>>>> size-524288(DMA)       0      0 524288    1  128 : tunables    1    1
>>>> 0 : slabdata      0      0      0
>>>> size-524288            0      0 524288    1  128 : tunables    1    1
>>>> 0 : slabdata      0      0      0
>>>> size-262144(DMA)       0      0 262144    1   64 : tunables    1    1
>>>> 0 : slabdata      0      0      0
>>>> size-262144            0      0 262144    1   64 : tunables    1    1
>>>> 0 : slabdata      0      0      0
>>>> size-131072(DMA)       0      0 131072    1   32 : tunables    8    4
>>>> 0 : slabdata      0      0      0
>>>> size-131072            3      3 131072    1   32 : tunables    8    4
>>>> 0 : slabdata      3      3      0
>>>> size-65536(DMA)        0      0  65536    1   16 : tunables    8    4
>>>> 0 : slabdata      0      0      0
>>>> size-65536             6      6  65536    1   16 : tunables    8    4
>>>> 0 : slabdata      6      6      0
>>>> size-32768(DMA)        0      0  32768    1    8 : tunables    8    4
>>>> 0 : slabdata      0      0      0
>>>> size-32768            10     10  32768    1    8 : tunables    8    4
>>>> 0 : slabdata     10     10      0
>>>>
>>>> size-16384(DMA)        0      0  16384    1    4 : tunables    8    4
>>>> 0 : slabdata      0      0      0
>>>> size-16384            44     44  16384    1    4 : tunables    8    4
>>>> 0 : slabdata     44     44      0
>>>>
>>>> size-8192(DMA)         0      0   8192    1    2 : tunables    8    4
>>>> 0 : slabdata      0      0      0
>>>> size-8192           3611   3611   8192    1    2 : tunables    8    4
>>>> 0 : slabdata   3611   3611      0
>>>>
>>>> size-4096(DMA)         0      0   4096    1    1 : tunables   24   12
>>>> 8 : slabdata      0      0      0
>>>> size-4096           1771   1771   4096    1    1 : tunables   24   12
>>>> 8 : slabdata   1771   1771      0
>>>>
>>>> size-2048(DMA)         0      0   2048    2    1 : tunables   24   12
>>>> 8 : slabdata      0      0      0
>>>> size-2048           4609   4714   2048    2    1 : tunables   24   12
>>>> 8 : slabdata   2357   2357      0
>>>>
>>>> size-1024(DMA)         0      0   1024    4    1 : tunables   54   27
>>>> 8 : slabdata      0      0      0
>>>> size-1024           4829   4900   1024    4    1 : tunables   54   27
>>>> 8 : slabdata   1225   1225      0
>>>>
>>>> size-512(DMA)          0      0    512    8    1 : tunables   54   27
>>>> 8 : slabdata      0      0      0
>>>> size-512            1478   1520    512    8    1 : tunables   54   27
>>>> 8 : slabdata    190    190     39
>>>>
>>>> size-256(DMA)          0      0    256   15    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> size-256            4662   5550    256   15    1 : tunables  120   60
>>>> 8 : slabdata    370    370      1
>>>>
>>>> size-128(DMA)          0      0    128   30    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> size-64(DMA)           0      0     64   59    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> size-64            17232  29382     64   59    1 : tunables  120   60
>>>> 8 : slabdata    498    498      0
>>>>
>>>> size-32(DMA)           0      0     32  112    1 : tunables  120   60
>>>> 8 : slabdata      0      0      0
>>>> size-128            9907  16140    128   30    1 : tunables  120   60
>>>> 8 : slabdata    538    538      0
>>>> size-32            12487  13104     32  112    1 : tunables  120   60
>>>> 8 : slabdata    117    117      0
>>>>
>>>> kmem_cache           181    181   4224    1    2 : tunables    8    4
>>>> 0 : slabdata    181    181      0
>>>>
>>>>
>>>> Tasks: 278 total,   1 running, 276 sleeping,   0 stopped,   1 zombie
>>>> Cpu(s):  3.8%us,  0.1%sy,  0.0%ni, 96.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>>> 0.0%st
>>>> Mem:  198091444k total, 197636988k used,   454456k free,     4544k
>>>> buffers
>>>> Swap: 75505460k total,  8567448k used, 66938012k free, 29144008k cached
>>>>
>>>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
>>>> COMMAND
>>>>
>>>>   107 root      15  -5     0    0    0 D   10  0.0   5:06.43
>>>> kswapd1
>>>>
>>>> 19328 user1    20   0 66.5g  60g 2268 D    4 32.0  31:48.49
>>>> R
>>>>
>>>>     1 root      20   0  1064   64   32 S    0  0.0   0:21.20
>>>> init
>>>>
>>>>     2 root      15  -5     0    0    0 S    0  0.0   0:00.06
>>>> kthreadd
>>>>
>>>>     3 root      RT  -5     0    0    0 S    0  0.0   0:00.24
>>>> migration/0
>>>>
>>>>     4 root      15  -5     0    0    0 S    0  0.0   1:01.12
>>>> ksoftirqd/0
>>>>
>>>>     5 root      RT  -5     0    0    0 S    0  0.0   0:00.30
>>>> migration/1
>>>>
>>>>     6 root      15  -5     0    0    0 S    0  0.0   0:00.50
>>>> ksoftirqd/1
>>>>
>>>>     7 root      RT  -5     0    0    0 S    0  0.0   0:00.22
>>>> migration/2
>>>>
>>>>     8 root      15  -5     0    0    0 S    0  0.0   0:00.36
>>>> ksoftirqd/2
>>>>
>>>>     9 root      RT  -5     0    0    0 S    0  0.0   0:00.28
>>>> migration/3
>>>>
>>>>    10 root      15  -5     0    0    0 S    0  0.0   0:00.60
>>>> ksoftirqd/3
>>>>
>>>>    11 root      RT  -5     0    0    0 S    0  0.0   0:00.18
>>>> migration/4
>>>>
>>>>    12 root      15  -5     0    0    0 S    0  0.0   0:00.40
>>>> ksoftirqd/4
>>>>
>>>>    13 root      RT  -5     0    0    0 S    0  0.0   0:00.26
>>>> migration/5
>>>>
>>>>    14 root      15  -5     0    0    0 S    0  0.0   0:00.76
>>>> ksoftirqd/5
>>>>
>>>>    15 root      RT  -5     0    0    0 S    0  0.0   0:00.20
>>>> migration/6
>>>>
>>>>    16 root      15  -5     0    0    0 S    0  0.0   0:00.36
>>>> ksoftirqd/6
>>>>
>>>>    17 root      RT  -5     0    0    0 S    0  0.0   0:00.26
>>>> migration/7
>>>>
>>>>    18 root      15  -5     0    0    0 S    0  0.0   0:00.68
>>>> ksoftirqd/7
>>>>
>>>>    19 root      RT  -5     0    0    0 S    0  0.0   0:00.88
>>>> migration/8
>>>>
>>>>    20 root      15  -5     0    0    0 S    0  0.0   0:07.70
>>>> ksoftirqd/8
>>>>
>>>>    21 root      RT  -5     0    0    0 S    0  0.0   0:01.12
>>>> migration/9
>>>>
>>>>    22 root      15  -5     0    0    0 S    0  0.0   0:01.20
>>>> ksoftirqd/9
>>>>
>>>>    23 root      RT  -5     0    0    0 S    0  0.0   0:03.50
>>>> migration/10
>>>>
>>>>    24 root      15  -5     0    0    0 S    0  0.0   0:01.22
>>>> ksoftirqd/10
>>>>
>>>>    25 root      RT  -5     0    0    0 S    0  0.0   0:04.84
>>>> migration/11
>>>>
>>>>    26 root      15  -5     0    0    0 S    0  0.0   0:01.90
>>>> ksoftirqd/11
>>>>
>>>>    27 root      RT  -5     0    0    0 S    0  0.0   0:01.46
>>>> migration/12
>>>>
>>>>    28 root      15  -5     0    0    0 S    0  0.0   0:01.42
>>>> ksoftirqd/12
>>>>
>>>>    29 root      RT  -5     0    0    0 S    0  0.0   0:01.62
>>>> migration/13
>>>>
>>>>    30 root      15  -5     0    0    0 S    0  0.0   0:01.84
>>>> ksoftirqd/13
>>>>
>>>>    31 root      RT  -5     0    0    0 S    0  0.0   0:01.90
>>>> migration/14
>>>>
>>>>    32 root      15  -5     0    0    0 S    0  0.0   0:01.18
>>>> ksoftirqd/14
>>>> --
>>>>
>>>> Thanks,
>>>> -J
>>>>
>>>> On Mon, Apr 19, 2010 at 10:07 AM, Andreas Dilger <
>>>> andreas.dilger at oracle.com> wrote:
>>>>
>>>>> There is a known problem with the DLM LRU size that may be affecting
>>>>> you. It may be something else too. Please check /proc/{slabinfo,meminfo} to
>>>>> see what is using the memory on the client.
>>>>>
>>>>> Cheers, Andreas
>>>>>
>>>>>
>>>>> On 2010-04-19, at 10:43, Jagga Soorma <jagga13 at gmail.com> wrote:
>>>>>
>>>>>  Hi Guys,
>>>>>>
>>>>>> My users are reporting some issues with memory on our lustre 1.8.1
>>>>>> clients.  It looks like when they submit a single job at a time the run time
>>>>>> was about 4.5 minutes.  However, when they ran multiple jobs (10 or less) on
>>>>>> a client with 192GB of memory on a single node the run time for each job was
>>>>>> exceeding 3-4X the run time for the single process.  They also noticed that
>>>>>> the swap space kept climbing even though there was plenty of free memory on
>>>>>> the system.  Could this possibly be related to the lustre client?  Does it
>>>>>> reserve any memory that is not accessible by any other process even though
>>>>>> it might not be in use?
>>>>>>
>>>>>> Thanks much,
>>>>>> -J
>>>>>> _______________________________________________
>>>>>> Lustre-discuss mailing list
>>>>>> Lustre-discuss at lists.lustre.org
>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100419/37123355/attachment.htm>


More information about the lustre-discuss mailing list