Home > database >  How to get each string within a buffer fetched with "getline" from a file in C
How to get each string within a buffer fetched with "getline" from a file in C

Time:12-16

I'm trying to read every string separated with commas, dots or whitespaces from every line of a text from a file (I'm just receiving alphanumeric characters with scanf for simplicity). I'm using the getline function from <stdio.h> library and it reads the line just fine. But when I try to "iterate" over the buffer that was fetched with it, it always returns the first string read from the file. Let's suppose I have a file called "entry.txt" with the following content:

test1234 test hello
another test2

And my "main.c" contains the following:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_WORD 500

int main()
{
    FILE *fp;
    int currentLine = 1;
    size_t characters, maxLine = MAX_WORD * 500;
    /* Buffer can keep up to 500 words of 500 characters each */
    char *word = (char *)malloc(MAX_WORD * sizeof(char)), *buffer = (char *)malloc((int)maxLine * sizeof(char));

    fp = fopen("entry.txt", "r");
    if (fp == NULL) {
        return 1;
    }

    for (currentLine = 1; (characters = getline(&buffer, &maxLine, fp)) != -1; currentLine  )
    {
        /* This line gets "test1234" onto "word" variable, as expected */
        sscanf(buffer, "%[a-zA-Z_0-9]", word);

        printf("%s", word); // As expected
        
        /* This line should get "test" string, but again it obtains "test1234" from the buffer */
        sscanf(buffer, "%[a-zA-Z_0-9]", word);

        printf("%s", word); // Not intended...

        // Do some stuff with the "word" and "currentLine" variables...
    }

    return 0;
}

What happens is that I'm trying to get every alphanumeric string (namely word from now on) in sequence from the buffer, when the sscanf function just gives me the first occurrence of a word within the specified buffer string. Also, every line on the entry file can contain an unknown amount of words separated by either whitespaces, commas, dots, special characters, etc.

I'm obtaining every line from the file separately with "getline" because I need to get every word from every line and store it in other place with the "currentLine" variable, so I'll know from which line a given word would've come. Any ideas of how to do that?

CodePudding user response:

fscanf has an input stream argument. A stream can change its state, so that the second call to fscanf reads a different thing. For example:

fscanf(stdin, "%s", str1);  // str1 contains some string; stdin advances
fscanf(stdin, "%s", str2);  // str2 contains some other sting

scanf does not have a stream argument, but it has a global stream to work with, so it works exactly like fscanf(stdin, ...).

sscanf does not have a stream argument, nor there is any global state to keep track of what was read. There is an input string. You scan it, some characters get converted, and... nothing else changes. The string remains the same string (how could it possibly be otherwise?) and no information about how far the scan has advanced is stored anywhere.

sscanf(buffer, "%s", str1);  // str1 contains some string; nothing else changes
sscanf(buffer, "%s", str2);  // str2 contains the same sting

So what does a poor programmer fo?

Well I lied. No information about how far the scan has advanced is stored anywhere only if you don't request it.

int nchars;
sscanf(buffer, "%s%n", str1, &nchars); // str1 contains some string;
                                       // nchars contains number of characters consumed
sscanf(buffer nchars, "%s", str2);     // str2 contains some other string

Error handling and %s field widths omitted for brevity. You should never omit them in real code.

  • Related