Home > Enterprise >  Splitting a comma separated string into 2 different arrays in C
Splitting a comma separated string into 2 different arrays in C

Time:10-04

I have a text file with 10000 lines, and each line has this exact format.

ABC,123

I am trying to split these two values into 2 different arrays, where one array holds the letters, and another array holds the numbers. How should I do this?

CodePudding user response:

Your task involves parsing a long text file into arrays of data held in memory. To avoid the need to reallocate the arrays, which you will learn about later, you could just define the arrays with a length of 10000 and then read one line at a time from the file, analyse the line contents and store the appropriate parts into the next elements of the arrays.

Here is a illustrative approach:

#include <stdio.h>
#include <string.h>

#define COUNT  10000

int main() {
    FILE *fp;              // stream pointer to read the text file
    char line[80];         // buffer for a line of input
    char name[COUNT][32];  // array of strings for the names
    int num[COUNT];        // array of integers for the numbers
    int n;                 // number of lines read

    fp = fopen("input.txt", "r");
    if (fp == NULL) {
        fprintf(stderr, "cannot open input file\n");
        return 1;
    }
    n = 0;
    // reading one line at a time for a maximum of COUNT entries
    while (n < COUNT && fgets(fp, line, sizeof line)) {
        // skip white space
        char *p = line   strspn(line, " \t\r\n");
        // ignore empty lines and comment lines
        if (*line == '\0' || *line == '#') {
            continue
        }
        if (sscanf(p, "1[^,],%d", &name[n][0], &num[n]) == 2) {
            // 2 fields converted successfully: one more entry in the array
            n  ;
        } else {
            // conversion was not successful: issue an error message
            fprintf(stderr, "invalid format: %s\n", line);
        }
    }
    fclose(fp);
    printf("number of entries read: %d\n", n);
    for (int i = 0; i < n; i  ) {
        printf("%d: %s,%d\n", i   1, name[i], num[i]);
    }
    return 0;
}

The core of the parsing is performed by sscanf(p, "1[^,],%d", &name[n][0], &num[n]):

  • p is a pointer to the beginning of the input line, after any initial white space.
  • 1[^,] copies a non-empty string of characters to name[n] with up to 31 characters, stopping at the end of string or ,, whichever comes first.
  • ,%d accepts a comma and converts an integer to num[n].

The conversion will fail to return 2 if any of these occur:

  • the input string starts with a ,, causing the first conversion to fail,
  • the input string does not contain a comma in the first 32 positions, causing the , to not match.
  • the input string does not contain a number after the ,, causing the second conversion to fail. A number is a sequence of digits optionally preceded by leading white space and/or a sign.

Note that ABC,123 is not a precise format. Did you mean that all lines have a 3 letter word followed by a comma and a 3 digit number? or more generally a string of characters followed by a comma and a number? Can the string be empty? A precise specification of the problem is very important to write correct code. In the above example , I assumed the lines are no longer than 79 bytes, contain a non empty string of up to 31 characters, followed by a comma and an integer, and possibly some more characters. For illustration, I also assume that empty lines should be ignored and lines starting with # should be ignored as comment lines.

CodePudding user response:

You can use the string function strtok:
"The C library function char *strtok(char *str, const char *delim) breaks string str into a series of tokens using the delimiter delim."

#include <string.h>

int main() {
    char string[] = "ABC,123";
    char *letters = strtok(string, ",");
    char *numbers = strtok(NULL, "");
}
  • Related