I have a text file of the following form. Left column is names of players and right column is their score in games they played.
john 40
mary 50
john 30
kevin 88
kevin 29
joe 102
david 11
mary 134
I want to sum up the scores of the players. So, I want to print output of the form
john 70
mary 184
kevin 117
joe 102
david 11
I know that this can be easily done in R
or Python
. But I want to do this using C
. So, I try to declare an array of structures in C and try to read each line from the file. struct is defined as a global variable, so by default, the struct members are initialized to zero values or null character in case of char array. Then, I try to read each row into the struct, which itself is an element of the array. But, while implementing this, I got stuck where new rows are to be read and then stored into structs. Is there any efficient way to do this ? Since R
or 'pandas
are based on C
, their underlying code is probably written in C
. How is it done there ?
Thanks
CodePudding user response:
Typically you'd read a line, split it up on whitespace, see if an entry with that name already exists in a hash table or tree or other map data structure, and if so, add the current value to it, and if not, insert it using the current value. Then at the end traverse the map printing out the entries. Basically, the same approach you'd take with any language.
However, those other languages often have things like map data structures, high level abstractions for reading files and parsing text, etc., so a task like this can be done in a few lines (Shoot, awk
can do it in one). With C, you have to write most of that stuff yourself, or use add-on libraries - the C standard, for example, has no hash table or trees. You basically have to do everything manually that languages like Python are doing for you under the hood.
Here's an example that uses the POSIX binary search tree functions (An awkward but portable API):
#define _GNU_SOURCE
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <search.h>
struct record {
int num;
char name[];
};
struct record *make_record(const char *name, int num) {
size_t len = strlen(name);
struct record *r = malloc(sizeof *r len 1);
r->num = num;
memcpy(r->name, name, len);
r->name[len] = 0;
return r;
}
int reccmp(const void *va, const void *vb) {
const struct record *a = va, *b = vb;
return strcmp(a->name, b->name);
}
void print_rec(const void *nodep, VISIT which, int depth) {
(void)depth;
// Print records in sorted order.
if (which == postorder || which == leaf) {
const struct record *r = *(const struct record **)nodep;
printf("%s\t%d\n", r->name, r->num);
}
}
int main(int argc, char **argv) {
if (argc != 2) {
fprintf(stderr, "Usage: %s filename\n", argc > 0 ? argv[0] : "program");
return EXIT_FAILURE;
}
FILE *fp = fopen(argv[1], "r");
if (!fp) {
fprintf(stderr, "%s: Unable to open %s: %s\n", argv[0], argv[1],
strerror(errno));
return EXIT_FAILURE;
}
void *counts = NULL; // Opaque pointer to the root of the tree
int lineno = 0;
char *line = NULL;
size_t line_len = 0;
while (getline(&line, &line_len, fp) > 0) {
lineno = 1;
char *saveptr = NULL;
char *name = strtok_r(line, " ", &saveptr);
char *numstr = strtok_r(NULL, " ", &saveptr);
if (!name || !*name || !numstr || !*numstr) {
fprintf(stderr, "Line %d of input is malformed!\n", lineno);
continue;
}
int num = atoi(numstr);
struct record *new_rec = make_record(name, num);
// tsearch() either inserts a new node and returns a pointer to it,
// or returns a pointer to an existing matching node.
struct record *found_rec =
*(struct record **)tsearch(new_rec, &counts, reccmp);
if (new_rec != found_rec) {
// If it's the latter, update its number sum and free the struct used
// to look it up.
found_rec->num = num;
free(new_rec);
}
}
free(line);
fclose(fp);
twalk(counts, print_rec);
#ifdef __GLIBC__
// Prevent spurious warnings from tools like ASan and valgrind about
// memory leaks.
tdestroy(counts, free);
#endif
return 0;
}
Example usage:
$ gcc -g -O -Wall -Wextra group.c
$ ./a.out input.txt
david 11
joe 102
john 70
kevin 117
mary 184