Symptoms: I allocate TLS key with a destructor, create a bundle of threads and pass the TLS key to each thread. Each thread allocates memory and sets its pointer in TLS, the TLS destructor deallocates memory. I wait for threads to finish before app exits. The app is run under valgrind/massif that reports this memory not deallocated.
int main(int argc, char **argv)
{
pthread_key_t* key = new pthread_key_t();
pthread_key_create(key, my_destructor);
pthread_t threads[32000];
for(int i=0; i<32000; i)
pthread_create(&threads[i], NULL, my_thread, key);
for(int i=0; i<32000; i)
pthread_join(threads[i], NULL);
return 0;
}
In the thread runner I allocate the memory and set it up in the TLS:
extern "C" void* my_thread(void* p)
{
pthread_setspecific(*(pthread_key_t*)p, malloc(100));
return NULL;
}
In the TLS destructor, I release the memory:
extern "C" void my_destructor(void *p)
{
free(p);
}
I run this under valgrind/massif 3.19 with the following options:
--tool=massif
--heap=yes
--pages-as-heap=yes
--log-file=/tmp/my.log
--massif-out-file=/tmp/my.massif.log
Then I run ms_print /tmp/my.massif.log
. I am getting the leaks reported like the following:
| ->01.75% (67,108,864B) 0x76F92D0: new_heap (in /usr/lib64/libc-2.17.so)
| | ->01.75% (67,108,864B) 0x76F98D3: arena_get2.isra.3 (in /usr/lib64/libc-2.17.so)
| | ->01.75% (67,108,864B) 0x76FF77D: malloc (in /usr/lib64/libc-2.17.so)
| | ->01.75% (67,108,864B) 0x410300: my_thread (threadsT.cpp:136)
| | ...
| | <skipped by author>
| | ...
| |
| ->00.00% (73,728B) in 1 places, all below ms_print's threshold (01.00%)
...while I would not expect anything reported leaked at all.
I added the instrumentation to my_destructor and manually verified that:
- it is invoked, indeed
- it deallocates the memory, as it is supposed to do
Is there something apparent I am doing wrong here that makes valgrind/massif report these? Is it a valgrind/massif limitation that it cannot detect the memory deallocation when invoked from TLS destructors?
Building and running that with gcc 4.9.4
on Red Hat Enterprise Linux Server release 7.9 (Maipo)
.
CodePudding user response:
You should check the return status for your thread creation. It's unlikely that you are succeeding in creating 32000 threads.
A bit of Valgrind source:
coregrind/pub_core_options.h:#define MAX_THREADS_DEFAULT 500
coregrind/m_scheduler/scheduler.c: VG_(printf)("Use --max-threads=INT to specify a larger number of threads\n"
Assuming that this is amd64 Linux, I believe that the default pthread stack size is 8Mbytes. That means you need 256Gbytes for stack memory. Does your machine have that much?
Please try the following
- Use
pthread_attr_setstacksize
to set the stack sizes toPTHREAD_STACK_MIN
(16k). - Run Valgrind with --max-threads=32001
Even with the above you may still hit some Valgrind limits such as VG_N_SEGMENTS.
If you see a message like
"Valgrind: FATAL: VG_N_SEGMENTS is too low.
Increase it and rebuild.
Exiting now."
Then you will need to rebuild Valgrind with an increased limit.
CodePudding user response:
A second answer, this time concentrating on the 'leak' aspect.
Massif isn't really a leak detector. It's for profiling heap use.
If I compile the example (with 320 threads) then at the end I get about 89 million bytes still allocated. That is made up of
75% the arena used by malloc called from start_thread
9% pthread_create
15% loading shared libraries
None of that looks like much of a concern to me. I assume that the start_thread memory is the pthread stack cache.
If I use massif for profiling malloc/new, then the last sample is
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
73 2,929,610 2,360 2,308 52 0