Description of what my function attempts to do
My function gets a string for example "Ab abc EF aG hi jkL" and turns it into ["abc", "hi"]
In addition, the function only takes into account letters and the letters all have to be lowercase.
The problem is that
char* str1 = "Ab abc EF aG hi jkL";
char* str2 = "This is a very famous quote";
char** tokens1 = get_tokens(str1);
printf("%s", tokens1[0]); <----- prints out "abc" correct output
char** tokens2 = get_tokens(str2);
printf("%s", tokens1[0]); <----- prints out "s" incorrect output
get_tokens function (Returns the 2d array)
char** get_tokens(const char* str) {
// implement me
int num_tokens = count_tokens(str);
char delim[] = " ";
int str_length = strlen(str);
char* new_str = malloc(str_length);
strcpy(new_str, str);
char* ptr = strtok(new_str, delim);
int index = 0;
char** array_2d = malloc(sizeof(char*) *num_tokens);
while (ptr != NULL){
if (check_string(ptr) == 0){
array_2d[index] = ptr;
index ;
}
ptr = strtok(NULL, delim);
}
free(new_str);
new_str = NULL;
free(ptr);
ptr = NULL;
return array_2d;
}
count_tokens function (returns the number of valid strings)
for example count_tokens("AB abc EF aG hi jkL") returns 2 because only "abc" and "hi" are valid
int count_tokens(const char* str) {
// implement me
//Seperate string using strtok
char delim[] = " ";
int str_length = strlen(str);
char* new_str = malloc(str_length);
strcpy(new_str, str);
char* ptr = strtok(new_str, delim);
int counter = 0;
while (ptr != NULL){
if (check_string(ptr) == 0){
counter ;
}
ptr = strtok(NULL, delim);
}
free(new_str);
return counter;
}
Lastly check_string() checks if a string is valid
For example check_string("Ab") is invalid because there is a A inside.
using strtok to split "Ab abc EF aG hi jkL" into separate parts
int check_string(char* str){
// 0 = false
// 1 = true
int invalid_chars = 0;
for (int i = 0; i<strlen(str); i ){
int char_int_val = (int) str[i];
if (!((char_int_val >= 97 && char_int_val <= 122))){
invalid_chars = 1;
}
}
return invalid_chars;
}
Any help would be much appreciated. Thank you for reading.
If you have any questions about how the code works please ask me. Also I'm new to stackoverflow, please tell me if I have to change something.
CodePudding user response:
You have a few problems in your code. First I'll repeat what I've said in the comments:
- Not allocating enough space for the string copies.
strlen
does not include the NUL terminator in its length, so when you do
char* new_str = malloc(str_length);
strcpy(new_str, str);
new_str
gets overwritten by 1 when strcpy
adds the '\0'
, invoking undefined behavior. You need to allocate one extra:
char* new_str = malloc(str_length 1);
strcpy(new_str, str);
- You should not
free
any pointer returned fromstrtok
. You onlyfree
memory that's been dynamically allocated usingmalloc
and friends.strtok
does no such thing, so it's incorrect tofree
the pointer it returns. Doing so also invokes UB.
Your final problem is because of this:
// copy str to new_str, that's correct because strtok
// will manipulate the string you pass into it
strcpy(new_str, str);
// get the first token and allocate size for the number of tokens,
// so far so good (but you should check that malloc succeeded)
char* ptr = strtok(new_str, delim);
char** array_2d = malloc(sizeof(char*) *num_tokens);
while (ptr != NULL){
if (check_string(ptr) == 0){
// whoops, this is where the trouble starts ...
array_2d[index] = ptr;
index ;
}
// get the next token, this is correct
ptr = strtok(NULL, delim);
}
// ... because you free new_str
free(new_str);
ptr
is a pointer to some token in new_str
. As soon as you free(new_str)
, Any pointer pointing to that now-deallocated memory is invalid. You've loaded up array_2d
with pointers to memory that's no longer allocated. Trying to access those locations again invokes undefined behavior. There's two ways I can think of off the top to solve this:
- Instead of saving pointers that are offsets to
new_str
, find the same tokens instr
(the string frommain
) and point to those instead. Since those are defined inmain
, they will exist for as long as the program exists. - Allocate some more memory, and
strcpy
the token intoarray_2d[index]
. I'll demonstrate this below:
while (ptr != NULL){
if (check_string(ptr) == false)
{
// allocate (enough) memory for the pointer at index
array_2d[index] = malloc(strlen(ptr) 1);
// you should _always_ check that malloc succeeds
if (array_2d[index] != NULL)
{
// _copy_ the string pointed to by ptr into our new space rather
// than simply assigning the pointer
strcpy(array_2d[index], ptr);
}
else { /* handle no mem error how you want */ }
index ;
}
ptr = strtok(NULL, delim);
}
// now we can safely free new_str without invalidating anything in array_2d
free(new_str);
I have a working demonstration here. Note some other things:
- I've
#include <stdbool.h>
and used that instead of 0 and 1int
s. - I've changed your
get_tokens
function a bit to "return" the number of tokens. This is useful inmain
for printing them out. - I've replaced the ASCII magic numbers with their characters.
- I've removed the useless
freedPointer = NULL
lines.
One final note, while this is a valid implementation, it's probably doing a bit more work than it needs to. Rather than counting the number of tokens in a first pass, then retrieving them in a second pass, you can surely do everything you want in a single pass, but I'll leave that as an exercise to you if you're so inclined.