I have a java process which is running on k8s.
I set Xms and Xmx to process.
java -Xms512M -Xmx1G -XX:SurvivorRatio=8 -XX:NewRatio=6 -XX: UseConcMarkSweepGC -XX: UseParNewGC -XX: CMSParallelRemarkEnabled -jar automation.jar
My expectation is that pod should consume 1.5 or 2 gb memory, but it consume much more, nearly 3.5gb. its too much. if ı run my process on a virtual machine, it consume much less memory.
When ı check memory stat for pods, ı reliase that pod allocate too much cache memory.
Rss nearly 1.5GB is OK. Because Xmx is 1gb. But why cache nearly 3GB.
is there any way to tune or control this usage ?
/app $ cat /sys/fs/cgroup/memory/memory.stat
cache 2881228800
rss 1069154304
rss_huge 446693376
mapped_file 1060864
swap 831488
pgpgin 1821674
pgpgout 966068
pgfault 467261
pgmajfault 47
inactive_anon 532504576
active_anon 536588288
inactive_file 426450944
active_file 2454777856
unevictable 0
hierarchical_memory_limit 16657932288
hierarchical_memsw_limit 9223372036854771712
total_cache 2881228800
total_rss 1069154304
total_rss_huge 446693376
total_mapped_file 1060864
total_swap 831488
total_pgpgin 1821674
total_pgpgout 966068
total_pgfault 467261
total_pgmajfault 47
total_inactive_anon 532504576
total_active_anon 536588288
total_inactive_file 426450944
total_active_file 2454777856
total_unevictable 0
CodePudding user response:
A Java process may consume much more physical memory than specified in -Xmx
- I explained it in this answer.
However, in your case, it's not even the memory of a Java process, but rather an OS-level page cache. Typically you don't need to care about the page cache, since it's the shared reclaimable memory: when an application wants to allocate more memory, but there is not enough immediately available free pages, the OS will likely free a part of the page cache automatically. In this sense, page cache should not be counted as "used" memory - it's more like a spare memory used by the OS for a good purpose while application does not need it.
The page cache often grows when an application does a lot of file I/O, and this is fine.
Async-profiler may help to find the exact source of growth:
run it with -e filemap:mm_filemap_add_to_page_cache
I demonstrated this approach in my presentation.