Home > Software design >  Function returns char**s, running the function twice causes the return value to differ
Function returns char**s, running the function twice causes the return value to differ

Time:02-18

Description of what my function attempts to do

My function gets a string for example "Ab abc EF aG hi jkL" and turns it into ["abc", "hi"]

In addition, the function only takes into account letters and the letters all have to be lowercase.

The problem is that

char* str1 = "Ab    abc EF  aG hi  jkL";
char* str2 = "This is a very famous quote";

char** tokens1 = get_tokens(str1); 
printf("%s", tokens1[0]);            <----- prints out "abc" correct output
char** tokens2 = get_tokens(str2);
printf("%s", tokens1[0]);            <----- prints out "s" incorrect output

get_tokens function (Returns the 2d array)

char** get_tokens(const char* str) {
  // implement me
  int num_tokens = count_tokens(str); 

  char delim[] = " ";
  int str_length = strlen(str);
  char* new_str = malloc(str_length); 
  strcpy(new_str, str); 

  char* ptr = strtok(new_str, delim);
  int index = 0;

  char** array_2d = malloc(sizeof(char*) *num_tokens);

  while (ptr != NULL){
    if (check_string(ptr) == 0){

      array_2d[index] = ptr; 
      index  ;
    }

    ptr = strtok(NULL, delim); 
  } 

  free(new_str); 
  new_str = NULL; 

  free(ptr);
  ptr = NULL; 

  return array_2d;
}  

count_tokens function (returns the number of valid strings)

for example count_tokens("AB abc EF aG hi jkL") returns 2 because only "abc" and "hi" are valid

int count_tokens(const char* str) {
  // implement me
  //Seperate string using strtok

  char delim[] = " ";
  int str_length = strlen(str);
  char* new_str = malloc(str_length); 
  strcpy(new_str, str); 

  char* ptr = strtok(new_str, delim); 

  int counter = 0; 

  while (ptr != NULL){
    if (check_string(ptr) == 0){
      counter  ;
    }

    ptr = strtok(NULL, delim); 
  }
  free(new_str);     
  return counter;
}  


Lastly check_string() checks if a string is valid

For example check_string("Ab") is invalid because there is a A inside.

using strtok to split "Ab abc EF aG hi jkL" into separate parts

int check_string(char* str){ 
  // 0 = false 
  // 1 = true
  int invalid_chars = 0; 

   for (int i = 0; i<strlen(str); i  ){
     int char_int_val = (int) str[i];
     if (!((char_int_val >= 97 && char_int_val <= 122))){
        invalid_chars = 1; 
    }
   }

  return invalid_chars;  
}


Any help would be much appreciated. Thank you for reading.

If you have any questions about how the code works please ask me. Also I'm new to stackoverflow, please tell me if I have to change something.

CodePudding user response:

You have a few problems in your code. First I'll repeat what I've said in the comments:

  • Not allocating enough space for the string copies. strlen does not include the NUL terminator in its length, so when you do
char* new_str = malloc(str_length); 
strcpy(new_str, str);

new_str gets overwritten by 1 when strcpy adds the '\0', invoking undefined behavior. You need to allocate one extra:

char* new_str = malloc(str_length   1); 
strcpy(new_str, str);

Your final problem is because of this:

// copy str to new_str, that's correct because strtok
// will manipulate the string you pass into it
strcpy(new_str, str);  
// get the first token and allocate size for the number of tokens,
// so far so good (but you should check that malloc succeeded)
char* ptr = strtok(new_str, delim);
char** array_2d = malloc(sizeof(char*) *num_tokens);

while (ptr != NULL){
    if (check_string(ptr) == 0){
      // whoops, this is where the trouble starts ...
      array_2d[index] = ptr; 
      index  ;
    }
    // get the next token, this is correct
    ptr = strtok(NULL, delim); 
  } 
  // ... because you free new_str
  free(new_str); 

ptr is a pointer to some token in new_str. As soon as you free(new_str), Any pointer pointing to that now-deallocated memory is invalid. You've loaded up array_2d with pointers to memory that's no longer allocated. Trying to access those locations again invokes undefined behavior. There's two ways I can think of off the top to solve this:

  1. Instead of saving pointers that are offsets to new_str, find the same tokens in str (the string from main) and point to those instead. Since those are defined in main, they will exist for as long as the program exists.
  2. Allocate some more memory, and strcpy the token into array_2d[index]. I'll demonstrate this below:
while (ptr != NULL){
    if (check_string(ptr) == false)
    {
      // allocate (enough) memory for the pointer at index
      array_2d[index] = malloc(strlen(ptr)   1);
      // you should _always_ check that malloc succeeds
      if (array_2d[index] != NULL)
      {
          // _copy_ the string pointed to by ptr into our new space rather
          // than simply assigning the pointer
          strcpy(array_2d[index], ptr);
      }
      else { /* handle no mem error how you want */ }
      index  ;
    }

    ptr = strtok(NULL, delim); 
}

// now we can safely free new_str without invalidating anything in array_2d
free(new_str); 

I have a working demonstration here. Note some other things:

  • I've #include <stdbool.h> and used that instead of 0 and 1 ints.
  • I've changed your get_tokens function a bit to "return" the number of tokens. This is useful in main for printing them out.
  • I've replaced the ASCII magic numbers with their characters.
  • I've removed the useless freedPointer = NULL lines.

One final note, while this is a valid implementation, it's probably doing a bit more work than it needs to. Rather than counting the number of tokens in a first pass, then retrieving them in a second pass, you can surely do everything you want in a single pass, but I'll leave that as an exercise to you if you're so inclined.

  • Related