I have a text file with 10000 lines, and each line has this exact format.
ABC,123
I am trying to split these two values into 2 different arrays, where one array holds the letters, and another array holds the numbers. How should I do this?
CodePudding user response:
Your task involves parsing a long text file into arrays of data held in memory. To avoid the need to reallocate the arrays, which you will learn about later, you could just define the arrays with a length of 10000
and then read one line at a time from the file, analyse the line contents and store the appropriate parts into the next elements of the arrays.
Here is a illustrative approach:
#include <stdio.h>
#include <string.h>
#define COUNT 10000
int main() {
FILE *fp; // stream pointer to read the text file
char line[80]; // buffer for a line of input
char name[COUNT][32]; // array of strings for the names
int num[COUNT]; // array of integers for the numbers
int n; // number of lines read
fp = fopen("input.txt", "r");
if (fp == NULL) {
fprintf(stderr, "cannot open input file\n");
return 1;
}
n = 0;
// reading one line at a time for a maximum of COUNT entries
while (n < COUNT && fgets(fp, line, sizeof line)) {
// skip white space
char *p = line strspn(line, " \t\r\n");
// ignore empty lines and comment lines
if (*line == '\0' || *line == '#') {
continue
}
if (sscanf(p, "1[^,],%d", &name[n][0], &num[n]) == 2) {
// 2 fields converted successfully: one more entry in the array
n ;
} else {
// conversion was not successful: issue an error message
fprintf(stderr, "invalid format: %s\n", line);
}
}
fclose(fp);
printf("number of entries read: %d\n", n);
for (int i = 0; i < n; i ) {
printf("%d: %s,%d\n", i 1, name[i], num[i]);
}
return 0;
}
The core of the parsing is performed by sscanf(p, "1[^,],%d", &name[n][0], &num[n])
:
p
is a pointer to the beginning of the input line, after any initial white space.1[^,]
copies a non-empty string of characters toname[n]
with up to 31 characters, stopping at the end of string or,
, whichever comes first.,%d
accepts a comma and converts an integer tonum[n]
.
The conversion will fail to return 2 if any of these occur:
- the input string starts with a
,
, causing the first conversion to fail, - the input string does not contain a comma in the first 32 positions, causing the
,
to not match. - the input string does not contain a number after the
,
, causing the second conversion to fail. A number is a sequence of digits optionally preceded by leading white space and/or a sign.
Note that ABC,123
is not a precise format. Did you mean that all lines have a 3 letter word followed by a comma and a 3 digit number? or more generally a string of characters followed by a comma and a number? Can the string be empty? A precise specification of the problem is very important to write correct code. In the above example , I assumed the lines are no longer than 79 bytes, contain a non empty string of up to 31 characters, followed by a comma and an integer, and possibly some more characters. For illustration, I also assume that empty lines should be ignored and lines starting with #
should be ignored as comment lines.
CodePudding user response:
You can use the string function strtok:
"The C library function char *strtok(char *str, const char *delim) breaks string str into a series of tokens using the delimiter delim."
#include <string.h>
int main() {
char string[] = "ABC,123";
char *letters = strtok(string, ",");
char *numbers = strtok(NULL, "");
}