I have a file containing the strings of 3 chromosome, which I want to concatenate into one genome. And then I have to access this concatenated string across multiple threads (I use pthread_t). To to this I have to use pthread_mutex_lock when extracting the data, then I use strcat to concatenate the data which are extracted using the function fai_fetch using const char* and then I am saving the data as a char* (see below).
// genome_size the size of all the chromosomes together
// chr_total the number of chromosomes I wish to concatenate
char* genome = (char*) malloc(sizeof(char) * (genome_size chr_total));
for (int i = 0; i < chr_total; i ){
pthread_mutex_lock(&data_mutex);
const char *data = fai_fetch(seq_ref,chr_names[i],&chr_sizes[i]);
pthread_mutex_unlock(&data_mutex);
//sprintf(&genome[strlen(genome)],data);
strcat(genome,data);
//sprintf(genome strlen(genome),data); //All three gives conditional jump or move error
//sprintf(genome,data); // THIS SOLVES VALGRIND ISSUE ONE BUT DOES NOT GIVE A CONCATENATED CHAR*
}
All of this works, but then running valgrind I get
Conditional jump or move depends on uninitialized value(s) referring to the "strcat(genome,data); "
and Uninitialized value was created by a heap allocation referring to "char* genome = (char*) malloc(sizeof(char) * (genome_size chr_total));"
Based on other StackOverflow answers I tried sprintf(&genome[strlen(genome)],data); and sprintf(genome strlen(genome),data); instead of strcat. However they too gives the same valgrind message.
The only thing that seems to alleviate this error is using sprintf(genome,data); however then i will not get the full genome but just a single chromosome.
Trying genome = sprintf(genome,data); gives me ./a.out': munmap_chunk(): invalid pointer: and ./a.out': free()
In regards to the "Uninitialized value was created by a heap allocation" error -> then my issue is that I am only able to free that memory after all of the threads are done running. So I am not sure how to initialize the values in the heap allocation when I am using malloc.
Is it possible to solve some of these specific valgrind errors?
CodePudding user response:
Using Valgrind to locate the problematic code
The "Conditional jump or move depends on uninitialised value(s)" message means Valgrind has determined that some result of your program depends on uninitialized memory.
Use the --track-origins=yes
flag to track the origin of the uninitialized value. It might help you finding that value. From man 1 valgrind
:
When set to yes, Memcheck keeps track of the origins of all uninitialised values. Then, when an uninitialised value error is reported, Memcheck will try to show the origin of the value. An origin can be one of the following four places: a heap block, a stack allocation, a client request, or miscellaneous other sources (eg, a call to brk).
More specifically in your program:
Problem 1: Using uninitialized genome
The line
char* genome = (char*) malloc(sizeof(char) * (genome_size chr_total));
Allocates the genome
buffer using malloc(2)
and then consumes it in:
strcat(genome,data);
Please note that functions such as strlen(3)
and strcat(3)
works on C-strings, which are buffers that terminate with a null character ('\0').
malloc(2)
just allocates memory and it doesn't initialize it so your allocated buffer may contain any value (and considered as uninitialized). You should avoid using string related functions with an uninitialized buffers as it results undefined behavior.
Fortunately calloc(2)
does the trick - it allocates the buffer and initializes all of its bits to zero, resulting a valid 0 length C-string you can operate on. I suggest the following fix to ensure that genome
is initialized:
char* genome = calloc(genome_size chr_total 1, sizeof(char));
Also note that I've added 1
to the length of the allocated buffer. It is done to guarantee that the resulting genome
will end with a null terminator (assuming that genome_size chr_total
is the total size of all the buffers returned from fai_fetch
).
Also note that in terms of performance calloc
is a bit slower than malloc
(because it initializes the data) but for my opinion it is safer as it initializes the whole buffer. For the purposes of your specific program, you could save the performance burden by using malloc
and initializing just the first byte:
char* genome = malloc(sizeof(char) * (genome_size chr_total 1));
if (NULL == genome) {
perror("malloc of genome failed");
exit(1)
}
// So it will be a valid 0 length c-string
genome[0] = 0;
We don't have to initialize the last byte to be 0 because strcat
writes the terminating null character for us.
(Potential) Problem 2: Using potentially non-null terminated data
with strcat
As you described in your question, the fai_fetch
extracts some data:
const char *data = fai_fetch(seq_ref,chr_names[i],&chr_sizes[i]);
and then consumes it in the strcat
line:
strcat(genome,data);
As I wrote above, because you use strcat
, data
should be null-terminated as well.
I'm not sure how fai_fetch
is implemented but if it returns a valid C-string then you're all good.
If it doesn't then you should use strncat
which works on buffers that are not null-terminated.
From man 3 strcat
:
The strncat() function is similar, except that
- it will use at most n bytes from src; and
- src does not need to be null-terminated if it contains n or more bytes.
I suggest the following fix:
// I'm not sure what type `&chr_sizes[i]` is, assuming it's `size_t`
size_t length = &chr_sizes[i];
const char *data = fai_fetch(seq_ref,chr_names[i], length);
// Use strcat
strncat(genome, used_data, length);