Home > database >  Error when trying to assign a string into a string variable inside a struct
Error when trying to assign a string into a string variable inside a struct

Time:12-10

Im writing a lexer software, but i got a problem when i tried to assign a string into a string variable inside a struct.

--common.h--
#define TEST printf("--TEST--\n")

struct Token {
    char* ID;
    char* string;          // String variable
};

struct Token* tokenizer(char* input);

void PrintToken(struct Token* token);

--lexer.c--
#include <stdio.h>
#include <string.h>
#include "common.h"

struct Token* tokenizer(char* input)
{
    struct Token* token;

    int toknum = 0;

    int i = -1;

    while (1) {
        char* string;

        for (i  = 1; input[i] != ' '; i  ) {
            string[i] = input[i];
        }

        strcpy(token[toknum].string, string);       // The problem is here.

        if (input[i] == '\n' || input[i] == '\0')
            break;        

        toknum  ;
    }

    return token;
}

void PrintToken(struct Token* token)
{
    for (int i = 0; i < 5; i  ) {
        printf("%s\n", token[i].string);
    }
}

--main.c--
#include <stdio.h>
#include "common.h"

int main()
{
    char* input = "Hello there";

    struct Token* token = tokenizer(input);

    PrintToken(token);

    return 0;
}

After I compile the program above with gcc main.c lexer.c -o final.o and run final.o, i got an error, it says:

Segmentation fault

I've tried to replace strcpy(token[toknum].string, string); with token[toknum].string = string;, but the result is the same.

Is there any way to avoid this error?

CodePudding user response:

You are using an uninitialized variable:

struct Token* token;

This only defines a pointer variable but does not assign any valid content. This means the content in indetermioned and reading the content of this variable causes undefined behaviour.

Also this variable does not point to a valid address. Writing to a "random" address via strcpy also causes undefined behaviour.

You must allocate dynamic memory for this:

struct Token *token = malloc(sizeof(*token));
// don't forget error handling

Then when you want more entries, enlarge the memory:

struct Token *temp = realloc(token, (toknum 1)*sizeof (*token));
if (NULL != temp) 
  token = temp;
else
// error handling

Just using an array like this:

struct Token token[x];

would reserve memory but you cannot return that address at the end of your function as the lifetime of that object will end at the same time and you may not access it after returning.


The same problem arises with string:

   char* string;

   for (i  = 1; input[i] != ' '; i  ) {
        string[i] = input[i];  // <<< string is uninitialized, does not point to valid memory.
   }

You do not provide any memory for the string. Here you could use a local array to hold the string.


And again same issue also with token[toknum].string.

Your struct only contains pointers. Again you need to reserve memory. Either you use dynamic memory allocation again or you make token.string an array of fixed length.

Your attempt to use token[toknum].string = string; would work, if token[toknum] and string both were valid.


There is also another issue in the same piece:

    for (i  = 1; input[i] != ' '; i  ) {
        string[i] = input[i];
    }

    strcpy(token[toknum].string, string); // << strcpy expects a nul-terminated string

You do not terminate your string properly. In this case strcpy will happily walk through your memory way beyond the boundaries of your memory allocation until is accidentally finds the terminating \0 byte.


And... what happens if input does not contain another space character? This loop will eventuall hit the terminating 0 of input and just continue ...

That's two more causes for undedined behaviour waiting for you.


That's only what I found in tokenizer at a first glance.

In function PrintToken you have another issue: What makes you think, you can print 5 elements of token? You never reserve memory for 5 elements and even if you did, you don't initialize the excess elements to contain some empty string.

CodePudding user response:

I don't think that the problem is in strcpy, but in tokenizer function.

The line with struct Token* token; is dangerous, because you are initializing array, but don't allocate memory for it. Try using struct Token token[5] if your array has a fixed number of elements, or use malloc function accordingly.

Also, the same problem is with char *string

  • Related