Home > Enterprise >  Trouble understanding fseek offset
Trouble understanding fseek offset

Time:11-30

I have a text file, where each line is an integer with a newline character. I also have a .bin file with the same thing.

10
20
30
40
50
60
70

Running this code...

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
    int input;
    FILE *infile_t = fopen("numbers.txt", "r");
    FILE *infile_b = fopen("numbers.bin", "rb");

    if (infile_t == NULL) {
            printf("Error: unable to open file %s\n", "numbers.txt");
            exit(1);
    }

    if (infile_b == NULL) {
            printf("Error: unable to open file %s\n", "numbers.bin");
            exit(1);
    }

    printf("Enter an integer index: ");
    while(scanf("%d",&input) != EOF){
        int ch;

        fseek(infile_t, (input*sizeof(int))-1, SEEK_SET);
        fscanf(infile_t, "In text file: %d\n", &ch);
        printf("In text file: %d\n", ch);

        fseek(infile_b, (input*sizeof(int))-1, SEEK_SET);
        fscanf(infile_b, "%d\n", &ch);
        printf("In binary file: %d\n", ch);

        printf("Enter an integer index: ");
    }

    fclose(infile_t);
    fclose(infile_b);

    return 0;
}

and entering 0, 1, 2, 3, 4 consecutively, I get the outputs: 10 0 40 50 0

I am trying to read the file by 4 bytes at a time (each int) and print the integer. What am I doing wrong and if this is bad practice, what would be better?

CodePudding user response:

There is a difference between the textual representation of numbers and their binary representation.

Your input is a text file, which is a sequence of characters:

"10lf20lf30lf40lf50lf60lf70lf"

Its size is 21 bytes, which you could check with your file explorer.

And as bytes in a tabular form it looks like this, assumed that you are using ASCII and a unix-like system:

Offset Bytes Text
0 31 30 0A "10lf"
3 32 30 0A "20lf"
6 33 30 0A "30lf"
9 34 30 0A "40lf"
12 35 30 0A "50lf"
15 36 30 0A "60lf"
18 37 30 0A "70lf"

There are no integers stored in binary form in your input file.

The function fseek() places the "cursor" into the file at the specified offset.

Then you call scanf() to scan and interpret(!) the sequence of characters that start at that offset.

Input Offset set by fseek() Text Resulting value
0 0 "10lf..." 10
1 4 "0lf..." 0
2 8 "lf40lf..." 40
3 12 "50lf..." 50
4 16 "0lf..." 0

Since scanf() skips leading whitespace, you get "40" in the third case.

You cannot use fseek() in the general case to "jump" to a certain line in a text file. Except, that you know how long each line is. In your case this is known, and if you use a factor of 3 instead of 4, you will get what you seem to want.

CodePudding user response:

I don't know what is in your 'numbers.bin', and you opened 'numbers.txt' as infile_t but didn't use it.

Assuming that the content in 'numbers.bin' is the text content in your question, and you open it in binary mode for reading, the contents stored in the file are as follows(end with one byte '\n' instead of two bytes '\r\n'):

\x31\x30\x0a\x32\x30\x0a\x33\x30\x0a\x34\x30\x0a\x35\x30\x0a\x36\x30\x0a\x37\x30

At this time, the file pointer is at the head of the file, pointing to the text content '1'(ascii code is 0x31).

\x31\x30\x0a\x32\x30\x0a\x33\x30\x0a\x34\x30\x0a\x35\x30\x0a\x36\x30\x0a\x37\x30
↑

when you use scanf("%d",&input) and input '0', the integer variable input will be 0, then you set the file pointer via fseek(infile_b, input*4, SEEK_SET), the file pointer will point to offset 0 relative to the beginning of the file.

Next line fscanf(infile_b, "%d\n", &ch) will read a integer value to variable ch, then ch will store the value 10 and print it to standard output (stdout) via printf.

When you enter '1', the file pointer will be set to 4, which will point to the fifth byte position relative to the beginning of the file, as follows:

\x31\x30\x0a\x32\x30\x0a\x33\x30\x0a\x34\x30\x0a\x35\x30\x0a\x36\x30\x0a\x37\x30
                ↑

The ascii code of the text value '0' is 0x30. It will read an integer value 0 and store it in ch.

You can replace fseek(infile_b, input*4, SEEK_SET) with fseek(infile_b, input*3, SEEK_SET), and will get the expected output.

  • Related