Home > Net >  Cannot access empty string from array of strings in C
Cannot access empty string from array of strings in C

Time:11-22

I'm using an array of strings in C to hold arguments given to a custom shell. I initialize the array of buffers using:

char *args[MAX_CHAR];

Once I parse the arguments, I send them to the following function to determine the type of IO redirection if there are any (this is just the first of 3 functions to check for redirection and it only checks for STDIN redirection).

int parseInputFile(char **args, char *inputFilePath) {
    char *inputSymbol = "<";
    int isFound = 0;
    
    for (int i = 0; i < MAX_ARG; i  ) {
        if (strlen(args[i]) == 0) {
            isFound = 0;
            break;
        }
        if ((strcmp(args[i], inputSymbol)) == 0) {
            strcpy(inputFilePath, args[i 1]);
            isFound = 1;
            break;
        }
    }
    return isFound;
}

Once I compile and run the shell, it crashes with a SIGSEGV. Using GDB I determined that the shell is crashing on the following line:

if (strlen(args[i]) == 0) {

This is because the address of arg[i] (the first empty string after the parsed commands) is inaccessible. Here is the error from GDB and all relevant variables:

(gdb) next
359         if (strlen(args[i]) == 0) {
(gdb) p args[0]
$1 = 0x7fffffffe570 "echo"
(gdb) p args[1]
$2 = 0x7fffffffe575 "test"
(gdb) p args[2]
$3 = 0x0
(gdb) p i
$4 = 2
(gdb) next

Program received signal SIGSEGV, Segmentation fault.
parseInputFile (args=0x7fffffffd570, inputFilePath=0x7fffffffd240 "") at shell.c:359
359         if (strlen(args[i]) == 0) {

I believe that the p args[2] returning $3 = 0x0 means that because the index has yet to be written to, it is mapped to address 0x0 which is out of the bounds of execution. Although I can't figure out why this is because it was declared as a buffer. Any suggestions on how to solve this problem?

EDIT: Per Kaylum's comment, here is a minimal reproducible example

#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys/wait.h>
#include <sys/stat.h>
#include<readline/readline.h>
#include<readline/history.h>
#include <fcntl.h>

// Defined values
#define MAX_CHAR 256
#define MAX_ARG 64
#define clear() printf("\033[H\033[J")  // Clear window
#define DEFAULT_PROMPT_SUFFIX "> "

char PROMPT[MAX_CHAR], SPATH[1024];


int parseInputFile(char **args, char *inputFilePath) {
    char *inputSymbol = "<";
    int isFound = 0;
    
    for (int i = 0; i < MAX_ARG; i  ) {
        if (strlen(args[i]) == 0) {
            isFound = 0;
            break;
        }
        if ((strcmp(args[i], inputSymbol)) == 0) {
            strcpy(inputFilePath, args[i 1]);
            isFound = 1;
            break;
        }
    }
    return isFound;
}

int ioRedirectHandler(char **args) {
    char inputFilePath[MAX_CHAR] = "";

    // Check if any redirects exist
    if (parseInputFile(args, inputFilePath)) {
        return 1;
    } else {
        return 0;
    }
}

void parseArgs(char *cmd, char **cmdArgs) {
    int na;
    // Separate each argument of a command to a separate string
    for (na = 0; na < MAX_ARG; na  ) {
        cmdArgs[na] = strsep(&cmd, " ");
        if (cmdArgs[na] == NULL) {
            break;
        }
        if (strlen(cmdArgs[na]) == 0) {
            na--;
        }
    }
}

int processInput(char* input, char **args, char **pipedArgs) {
    // Parse the single command and args
    parseArgs(input, args);
    return 0;
}

int getInput(char *input) {
    char *buf, loc_prompt[MAX_CHAR] = "\n";

    strcat(loc_prompt, PROMPT);
    buf = readline(loc_prompt);
    if (strlen(buf) != 0) {
        add_history(buf);
        strcpy(input, buf);
        return 0;
    } else {
        return 1;
    }
}

void init() {
    char *uname;
    clear();
    uname = getenv("USER");
    printf("\n\n \t\tWelcome to Student Shell, %s! \n\n", uname);
    
    // Initialize the prompt
    snprintf(PROMPT, MAX_CHAR, "%s%s", uname, DEFAULT_PROMPT_SUFFIX);
}

int main() {
    char input[MAX_CHAR];
    char *args[MAX_CHAR], *pipedArgs[MAX_CHAR];
    int isPiped = 0, isIORedir = 0;
    init();

    while(1) {
        // Get the user input
        if (getInput(input)) {
            continue;
        }
        isPiped = processInput(input, args, pipedArgs);

        isIORedir = ioRedirectHandler(args);
    }
    return 0;
}

Note: If I forgot to include any important information, please let me know and I can get it updated.

CodePudding user response:

When you write

char *args[MAX_CHAR];

you allocate room for MAX_CHAR pointers to char. You do not initialise the array. If it is a global variable, you will have initialised all the pointers to NULL, but you do it in a function, so the elements in the array can point anywhere. You should not dereference them before you have set the pointers to point at something you are allowed to access.

You also do this, though, in parseArgs(), where you do this:

    cmdArgs[na] = strsep(&cmd, " ");

There are two potential issues here, but let's deal with the one you hit first. When strsep() is through the tokens you are splitting, it returns NULL. You test for that to get out of parseArgs() so you already know this. However, where your program crashes you seem to have forgotten this again. You call strlen() on a NULL pointer, and that is a no-no.

There is a difference between NULL and the empty string. An empty string is a pointer to a buffer that has the zero char first; the string "" is a pointer to a location that holds the character '\0'. The NULL pointer is a special value for pointers, often address zero, that means that the pointer doesn't point anywhere. Obviously, the NULL pointer cannot point to an empty string. You need to check if an argument is NULL, not if it is the empty string.

If you want to check both for NULL and the empty string, you could do something like

if (!args[i] || strlen(args[i]) == 0) {

If args[i] is NULL then !args[i] is true, so you will enter the if body if you have NULL or if you have a pointer to an empty string.

(You could also check the empty string with !(*args[i]); *args[i] is the first character that args[i] points at. So *args[i] is zero if you have the empty string; zero is interpreted as false, so !(*args[i]) is true if and only if args[i] is the empty string. Not that this is more readable, but it shows again the difference between empty strings and NULL).

I mentioned another issue with the parsed arguments. Whether it is a problem or not depends on the application. But when you parse a string with strsep(), you get pointers into the parsed string. You have to be careful not to free that string (it is input in your main() function) or to modify it after you have parsed the string. If you change the string, you have changed what all the parsed strings look at. You do not do this in your program, so it isn't a problem here, but it is worth keeping in mind. If you want your parsed arguments to survive longer than they do now, after the next command is passed, you need to copy them. The next command that is passed will change them as it is now.

CodePudding user response:

In main

char input[MAX_CHAR];
char *args[MAX_CHAR], *pipedArgs[MAX_CHAR];

are all uninitialized. They contain indeterminate values. This could be a potential source of bugs, but is not the reason here, as

  • getInput modifies the contents of input to be a valid string before any reads occur.
  • pipedArgs is unused, so raises no issues (yet).
  • args is modified by parseArgs to (possibly!) contain a NULL sentinel value, without any indeterminate pointers being read first.

Firstly, in parseArgs it is possible to completely fill args without setting the NULL sentinel value that other parts of the program should rely on.

Looking deeper, in parseInputFile the following

if (strlen(args[i]) == 0)

contradicts the limits imposed by parseArgs that disallows empty strings in the array. More importantly, args[i] may be the sentinel NULL value, and strlen expects a non-NULL pointer to a valid string.

This termination condition should simply check if args[i] is NULL.

With

strcpy(inputFilePath, args[i 1]);

args[i 1] might also be the NULL sentinel value, and strcpy also expects non-NULL pointers to valid strings. You can see this in action when inputSymbol is a match for the final token in the array.

args[i 1] may also evaluate as args[MAX_ARGS], which would be out of bounds.

Additionally, inputFilePath has a string length limit of MAX_CHAR - 1, and args[i 1] is (possibly!) a dynamically allocated string whose length might exceed this.


Some edge cases found in getInput:

Both arguments to

strcat(loc_prompt, PROMPT);

are of the size MAX_CHAR. Since loc_prompt has a length of 1. If PROMPT has the length MAX_CHAR - 1, the resulting string will have the length MAX_CHAR. This would leave no room for the NUL terminating byte.

readline can return NULL in some situations, so

buf = readline(loc_prompt);
if (strlen(buf) != 0) {

can again pass the NULL pointer to strlen.

A similar issue as before, on success readline returns a string of dynamic length, and

strcpy(input, buf);

can cause a buffer overflow by attempting to copy a string greater in length than MAX_CHAR - 1.

buf is a pointer to data allocated by malloc. It's unclear what add_history does, but this pointer must eventually be passed to free.


Some considerations.

Firstly, it is a good habit to initialize your data, even if it might not matter.

Secondly, using constants (#define MAX_CHAR 256) might help to reduce magic numbers, but they can lead you to design your program too rigidly if used in the same way.

Consider building your functions to accept a limit as an argument, and return a length. This allows you to more strictly track the sizes of your data, and prevents you from always designing around the maximum potential case.

A slightly contrived example of designing like this. We can see that find does not have to concern itself with possibly checking MAX_ARGS elements, as it is told precisely how long the list of valid elements is.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_ARGS 100

char *get_input(char *dest, size_t sz, const char *display) {
    char *res;

    if (display)
        printf("%s", display);

    if ((res = fgets(dest, sz, stdin)))
        dest[strcspn(dest, "\n")] = '\0';

    return res;
}

size_t find(char **list, size_t length, const char *str) {
    for (size_t i = 0; i < length; i  )
        if (strcmp(list[i], str) == 0)
            return i;

    return length;
}

size_t split(char **list, size_t limit, char *source, const char *delim) {
    size_t length = 0;
    char *token;

    while (length < limit && (token = strsep(&source, delim)))
        if (*token)
            list[length  ] = token;

    return length;
}

int main(void) {
    char input[512] = { 0 };
    char *args[MAX_ARGS] = { 0 };

    puts("Welcome to the shell.");

    while (1) {
        if (get_input(input, sizeof input, "$ ")) {
            size_t argl = split(args, MAX_ARGS, input, " ");
            size_t redirection = find(args, argl, "<");

            puts("Command parts:");

            for (size_t i = 0; i < redirection; i  )
                printf("%zu: %s\n", i, args[i]);

            puts("Input files:");

            if (redirection == argl)
                puts("[[NONE]]");
            else for (size_t i = redirection   1; i < argl; i  )
                printf("%zu: %s\n", i, args[i]);

        }
    }
}
  •  Tags:  
  • c gdb
  • Related