Home > Software engineering >  Why writing a text file in C does not gives me expected results?
Why writing a text file in C does not gives me expected results?

Time:01-08

I am trying to do several exercises to understand the difference between write text and binary files on C, and when looking at results with an hexdump utility I am finding unexpected results. Can you please help me to understand the reason ?

Particularly, I am trying the following code for writing a text file:

#include <stdio.h>

int main() {
    FILE *ptr_myfile;
    char c = 'a';
    int numero = 12345;

    ptr_myfile = fopen("test.txt","w");

    if (!ptr_myfile){
        printf("Unable to open file!");
        return 1;
    }

    fwrite(&c, sizeof(char), 1, ptr_myfile);
    fwrite(&numero, sizeof(int), 1, ptr_myfile);

    fclose(ptr_myfile);

    return 0;
}

When doing a "cat test.txt" I found that the contents of the file are:

cat test.txt

a90

Can not understand how 12345 was converted to 90.

Moreover If I do a

hexdump test.txt

0000000 3961 0030 0000
0000005

On that case, I am findig a first byte written with the value 39. Why ? Second value (61) already matches the ascii value fo 'a'' (61 hex = 97 dec = 'a' ascii code), but can not find a logical explanation for the rest of the bits.

If I change the writing mode to binary file, modifying the line

ptr_myfile=fopen("test.txt","w")  by ptr_myfile=fopen("test.txt","wb")

I do not see any change on behavior on the written contents of the file.

CodePudding user response:

The contents of the file test.txt is:

$ hexdump -C test.txt

00000000  61 39 30 00 00                                    |a90..|
00000005

The first byte 61 is 'a' and the bytes after that is the little-endian representation of 12345.

39 30 00 00 are 4 bytes which is the typical size for an int.

Note that this number is not 0x39300000 but 0x00003039.

The byte order of the number written is dependent on the endianness of your system.

You can observe this yourself, by using htonl to convert from host endianness to big-endian (network byte order):

#include <stdio.h>

int main() {
    FILE *ptr_myfile;
    char c = 'a';
    int numero = 12345;
    ptr_myfile = fopen("test.txt","w");

    if (!ptr_myfile) {
        printf("Unable to open file!");
        return 1;
    }

    // convert from host endianness to network byte order
    int numero_big_endian = htonl(numero);

    fwrite(&c, sizeof(char), 1, ptr_myfile);
    fwrite(&numero_big_endian, sizeof(int), 1, ptr_myfile);
    fclose(ptr_myfile);

    return 0;
}

This will yield:

$ hexdump -C test.txt

00000000  61 00 00 30 39                                    |a..09|
00000005

As you can see the byte order is now reversed.

This is one of the reasons why you might not want to write binary data directly to disk because of the differences in endianness.

A big-endian system will recognize 0x00003039 as 0x39300000 which would be 959447040 and not 1234.

As others have mentioned, fwrite does not write data in their string representation.

If you want that, you can use snprintf (or use fprintf) to convert your number to a string first, then write it to a file:

#include <stdio.h>
#include <string.h>

int main() {
    FILE *ptr_myfile;
    char c = 'a';
    int numero = 12345;
    ptr_myfile = fopen("test.txt","w");

    if (!ptr_myfile) {
        printf("Unable to open file!");
        return 1;
    }

    // convert numero to a string
    char numero_str[64];
    // check result of snprintf, omitted for readability
    snprintf(numero_str, sizeof(numero_str), "%d", numero);

    fwrite(&c, sizeof(char), 1, ptr_myfile);
    fwrite(numero_str, strlen(numero_str), 1, ptr_myfile);
    fclose(ptr_myfile);

    return 0;
}
$ cat test.txt

a12345

CodePudding user response:

When you use fwrite the write function processes data as if it is binary of a certain length. This has nothing to do with the file opening mode you selected earlier.

Lets consider the following example:

/** A character buffer. */
char *ascii_buf = "ABCD";

/** A buffer which contains binary representation of A, B, C, D letters in ASCII. */
uint8_t binary_buf[4] = { 65, 66, 67, 68 };

written = fwrite(ascii_buf, 1, strlen(ascii_buf), fout);
written = fwrite(binary_buf, 1, sizeof(binary_buf), fout);

The above two calls to fwrite result in the same output "ABCD" into the target output file.

The only difference resids in the way the data is interpreted. In the first case ascii_buf data is interpreted as character. While in the second case binary_buf data is interpreted as unsigned integers. There content is the same, but their representation is different.

You will usually want to use:

  • fprintf to output formatted strings to a file.
  • fwrite to output raw data to a file.
  •  Tags:  
  • c
  • Related