Home > Software design >  Using fgets() and strtok() to read in a file line-by-line in C
Using fgets() and strtok() to read in a file line-by-line in C

Time:05-02

EDITED

First time here!

I am trying to split a line in words and save them in an array of 13 positions. I have a CSV file (bcancer.csv) separated with ; with 13 columns.

The file contains 700 lines like this ones with obviously different info (no blank lines at all):

1000025;0;89;5;1;1;1;2;1;3;1;1;2

1002945;0;71;5;4;4;5;7;10;3;2;1;2

1015425;0;55;3;1;1;1;2;2;3;1;1;2

1016277;0;76;6;8;8;1;3;4;3;7;1;2

1017023;0;91;4;1;1;3;2;1;3;1;1;2

And I don't know if I should use a 2D array or unidimensional, because I have to print all the info later in another function.

As for now, I have this code:

#include<stdio.h>
#include<string.h>

int muestraMenu ();
void opcion0 ();

int main ()
{
  int opcion = 0;
 
  do
    {
      opcion = muestraMenu ();
      printf ("\tOpción elegida: %d\n", opcion);
      switch (opcion)
    {
    case 0:
      printf (" 0. Configuración e inicialización");
      opcion0();
      break;
    }
    }
  while (opcion != 0);
  return 0;
}

int
muestraMenu ()
{
  int opcion;
  printf ("\n   0. Configuración e inicialización");
  printf ("\n\n   Introduzca opción: ");
  scanf ("%d", &opcion);
  return opcion;
}

void opcion0 ()
{
    char line[100] = "";
    char *datos_pac[13];
    int i = 0;
    FILE *f = fopen ("bcancer.csv", "r");

    while(!feof(f))
    {
    fgets(line,100,f);
    datos_pac[i] = strtok(line, ";");  //Guardo primer elemento

    while(datos_pac[i] != NULL)     //Guardo el resto de elementos
    {
        datos_pac[  i] = strtok(NULL, ";");
        printf("%s ",datos_pac[i-1]);
    }
    }
}

But it's not working,the first lines are printed wrong and second one it's not printed at all. Also after 50 lines it stopped.

For the output I expect to save all the info in the lines in an array.

datos_pac[0][0]=1000025
datos_pac[0][1]=0
datos_pac[0][2]=89

and so on...

Please let me know where I'm going wrong, thanks!

CodePudding user response:

You cannot use strtok() for this task because strtok() considers sequences of delimiters as a single delimiter. This is OK for splitting space separated words, but inadequate for CSV parsing where ; can delimit empty fields.

You can use strchr() or strcspn() to find the end of the current field and use strndup() to allocate a copy of the string.

Also note that while (!feof(f)) is always wrong and read Why is “while ( !feof (file) )” always wrong? .

Here is a modified version:

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void opcion0(void) {
    char line[100];
    FILE *f = fopen("bcancer.csv", "r");

    if (f == NULL) {
        fprintf(stderr, "error opening bcancer.csv: %s\n", strerror(errno));
        return;
    }
    while (fgets(line, sizeof line, f)) {
        char *datos_pac[13];
        int i, pos = 0;
        for (i = 0; i < 13; i  ) {
            i = strcspn(line   pos, ";\n");
            // make datos_pac pointers point inside line
            datos_pac[i] = line   pos;
            pos  = i;
            if (line[pos] != '\0') {
                // set a null terminator at the end of the field
                line[pos  ] = '\0';
            }
        }
        for (i = 0; i < 13; i  ) {
            printf("%s ", datos_pac[i]);
        }
        printf("\n");
        // use datos_pac for some other use
    }
    fclose(f);
}

The above code does not allocate memory. If you want the keep the strings for later use and store them inside a array, you could replace the inner code with this allocation alternative:

        i = strcspn(line   pos, ";\n");
        // allocate a copy of the field into the datos_pac array
        datos_pac[i] = strndup(line   pos, i);
        pos  = i;
        if (line[pos] == ';') {
            pos  ;
        }

It is then your responsibility to free these strings after use and before the pointers go out of scope.

Alternately, if all data in the CSV file are numbers, you could use sscanf() to parse the line:

    while (fgets(line, sizeof line, f)) {
        int data[13];
        if (sscanf(line, "%d;%d;%d;%d;%d;%d;%d;%d;%d;%d;%d;%d;%d",
                   &data[0], &data[1], &data[2], &data[3], &data[4], 
                   &data[5], &data[6], &data[7], &data[8], &data[9], 
                   &data[10], &data[11], &data[12]) == 13) {
            /* all fields were converted correctly */
            printf("%d %d %d %d %d %d %d %d %d %d %d %d %d\n",
                   data[0], data[1], data[2], data[3], data[4], 
                   data[5], data[6], data[7], data[8], data[9], 
                   data[10], data[11], data[12]);
            // ...
        } else {
            printf("invalid data: %s\n", line);
        }
    }

CodePudding user response:

Flaws in the code presented include:

  • while (!feof(f)) is always wrong. Short version: feof speaks to whether a previous read detected end-of-file, not (otherwise) to whether the next one will do. Instead, attempt the read and test whether it succeeded.

  • The return value of fgets() is not tested, so you don't know whether it succeeded. See also the previous point.

  • There is no check of whether a complete line has been read, or handling for the case where it isn't. When a complete line is read, the trailing newline will be included among the characters that fgets() writes in the buffer. If the line, including newline, is longer than the buffer can accommodate then the tail will remain unread, and the next fgets() will pick up from there.

  • if a complete line is read, then the code presented will include that trailing newline in the last field's value. That is unlikely to be what you want.

  • strtok() tokenizes in-place by inserting string terminators into the specified string. Because tokens generated that way share storage with the original string,

    • in the code presented, the tokens for each line will be overwritten when you read the next line, and

    • the lifetime of the tokens is the same as that of the buffer, so they will not outlive the call to function opcion0.

    Neither of those is a problem for the code presented, but both will be problems for saving the tokens to print later in another function. For that, you need a data structure that can hold the data for all the lines at once, and you need to make copies of the tokens that will survive from one loop iteration to the next, and onward after the function returns. The data structure might be a 2D array of char * of a linked list of 1D array of char *, for example. To make the tokens sufficiently persistent, you probably need to make dynamically-allocated copies, such as via strdup().

  • datos_pac[ i] = strtok(NULL, ";") always stores one more value (a null pointer) in array datos_pac than there are elements in the line read. Because you allocate only enough space for exactly the number of elements you expect, this will overrun the bounds of your array when you get (at least) the number of elements you expect. You should not try to be so clever. Instead, accept strtok's return value into a for-purpose scalar variable, test whether it is null, and assign it to the array only if it is not. (But see also next)

  • The code does not protect against overrun in the case where the data are malformed, with more elements in one line than you are prepared to accommodate. That might happen, for instance, if two lines are accidentally concatenated. Robust code would protect against that.

  • The code does not reset i to 0 between lines. This will cause (further) array overruns when the second and subsequent lines are read. The resulting undefined behavior might take any form, but it is likely that the program would crash pretty soon.

CodePudding user response:

First, as @DaveMeehan notes, we need a proper minimal reproducible example - code and sample input - to understand what's going on.

That said... you should try to debug your program yourself! See:

and here is some specific advice about your program, which may or may not help with your bug:

  • Please use strtok_r() instead of strtok(), for your code to be re-entrant / thread safe. (Example use here).

  • You must check the return value of fgets().

  • You should probably not assume the input line fits into 100 bytes; and whatever size you limit your buffer to - use a proper named constant rather than a "magic number".

  •  Tags:  
  • c
  • Related