[Lustre-devel] Hangs with cgroup memory controller
Mark Hills
Mark.Hills at framestore.com
Wed Jul 27 11:57:57 PDT 2011
On Wed, 27 Jul 2011, Andreas Dilger wrote:
[...]
> Possibly you can correlate reproducer cases with Lustre errors on the
> console?
I've managed to catch the bad state, on a clean client too -- there's no
errors reported from Lustre in dmesg.
Here's the information reported by the cgroup. It seems that there's a
discrepancy of 2x pages (the 'cache' field, pgpgin, pgpgout).
The process which was in the group terminated a long time ago.
I can leave the machine in this state until tomorrow, so any suggestions
for data to capture that could help trace this bug would be welcomed.
Thanks.
# cd /cgroup/p25321
# echo 1 > memory.force_empty
<hangs: the bug>
# cat tasks
<none>
# cat memory.max_usage_in_bytes
1281351680
# cat memory.usage_in_bytes
8192
# cat memory.stat
cache 8192 <--- two pages
rss 0
mapped_file 0
pgpgin 396369 <--- two pages higher than pgpgout
pgpgout 396367
swap 0
inactive_anon 0
active_anon 0
inactive_file 0
active_file 0
unevictable 0
hierarchical_memory_limit 8388608000
hierarchical_memsw_limit 10485760000
total_cache 8192
total_rss 0
total_mapped_file 0
total_pgpgin 396369
total_pgpgout 396367
total_swap 0
total_inactive_anon 0
total_active_anon 0
total_inactive_file 0
total_active_file 0
total_unevictable 0
# echo 1 > /proc/sys/vm/drop_caches
<success>
# echo 2 > /proc/sys/vm/drop_caches
<success>
# cat memory.stat
<same as above>
--
Mark
More information about the lustre-devel
mailing list