Home > Software design >  Reading a pdf file with fread in C does not end up as expected
Reading a pdf file with fread in C does not end up as expected

Time:03-02

I am trying to read from a pdf file and write into another file where I run to the problem.

In the while loop, fread reads only 589 bytes which is expected to be 1024 for the first time. In the second loop, fread reads 0 bytes.

I am sure that the pdf file is beyond 1024 bytes.

Here is a similar problem. The phenomenon is the same. But I do not use strlen() which causes that problem.

So how can I resolve the problem?

My code is here:

#include <stdio.h>

#define MAXLINE 1024

int main() {
    FILE *fp;
    int read_len;
    char buf2[MAXLINE];
    FILE *fp2;
    fp2 = fopen("test.pdf", "w");
    if ((fp = fopen("LearningSpark.pdf", "r")) == NULL) {
        printf("Open file failed\n");
    }
    while ((read_len = fread(buf2, sizeof(char), MAXLINE, fp)) > 0) {
        int write_length = fwrite(buf2, sizeof(char), read_len, fp2);
        if (write_length < read_len) {
            printf("File write failed\n");
            break;
        }
    }
    return 0;
}

CodePudding user response:

fopen(filename, "r") is system dependent. See this post on what may happen to the data you read if you are on Windows, for example. Basically it is related to how certain characters are translated on different systems in text mode, ie., \n is "End-of-Line" on Unix-type systems, but on Windows it is \r\n.

Important: On Windows, ASCII char 27 will result in End-Of-File, if reading in text mode, "r", causing the fread() to terminate prematurely.

To read a binary file, use the "rb" specifier. Similarly for "w", as metioned here, you should use "wb" to write binary data.

CodePudding user response:

Binary files such as pdf files must be open in binary mode to prevent end of line translation and other text mode handling on legacy systems such as Windows.

Also note that you should abort when fopen() fails and you should close the files.

Here is a modified version:

#include <errno.h>
#include <stdio.h>
#include <string.h>

#define MAXLINE 1024

int main() {
    char buf2[MAXLINE];
    int read_len;
    FILE *fp;
    FILE *fp2;
    if ((fp = fopen("LearningSpark.pdf", "rb")) == NULL) {
        fprintf(stderr, "Open file failed for %s: %s\n", "LearningSpark.pdf", strerror(errno));
        return 1;
    }
    if ((fp2 = fopen("test.pdf", "wb")) == NULL) {
        fprintf(stderr, "Open file failed for %s: %s\n", "test.pdf", strerror(errno));
        fclose(fp);
        return 1;
    }

    while ((read_len = fread(buf2, 1, MAXLINE, fp)) > 0) {
        int write_length = fwrite(buf2, 1, read_len, fp2);
        if (write_length < read_len) {
            fprintf(stderr, "File write failed: %s\n", strerror(errno));
            break;
        }
    }
    fclose(fp);
    fclose(fp2);
    return 0;
}
  • Related