I am trying to do several exercises to understand the difference between write text and binary files on C, and when looking at results with an hexdump utility I am finding unexpected results. Can you please help me to understand the reason ?
Particularly, I am trying the following code for writing a text file:
#include <stdio.h>
int main() {
FILE *ptr_myfile;
char c = 'a';
int numero = 12345;
ptr_myfile = fopen("test.txt","w");
if (!ptr_myfile){
printf("Unable to open file!");
return 1;
}
fwrite(&c, sizeof(char), 1, ptr_myfile);
fwrite(&numero, sizeof(int), 1, ptr_myfile);
fclose(ptr_myfile);
return 0;
}
When doing a "cat test.txt" I found that the contents of the file are:
cat test.txt
a90
Can not understand how 12345 was converted to 90.
Moreover If I do a
hexdump test.txt
0000000 3961 0030 0000
0000005
On that case, I am findig a first byte written with the value 39. Why ? Second value (61) already matches the ascii value fo 'a'' (61 hex = 97 dec = 'a' ascii code), but can not find a logical explanation for the rest of the bits.
If I change the writing mode to binary file, modifying the line
ptr_myfile=fopen("test.txt","w") by ptr_myfile=fopen("test.txt","wb")
I do not see any change on behavior on the written contents of the file.
CodePudding user response:
The contents of the file test.txt
is:
$ hexdump -C test.txt
00000000 61 39 30 00 00 |a90..|
00000005
The first byte 61
is 'a'
and the bytes after that is the little-endian representation of 12345
.
39 30 00 00
are 4 bytes which is the typical size for an int
.
Note that this number is not 0x39300000
but 0x00003039
.
The byte order of the number written is dependent on the endianness of your system.
You can observe this yourself, by using htonl
to convert from host endianness to big-endian (network byte order):
#include <stdio.h>
int main() {
FILE *ptr_myfile;
char c = 'a';
int numero = 12345;
ptr_myfile = fopen("test.txt","w");
if (!ptr_myfile) {
printf("Unable to open file!");
return 1;
}
// convert from host endianness to network byte order
int numero_big_endian = htonl(numero);
fwrite(&c, sizeof(char), 1, ptr_myfile);
fwrite(&numero_big_endian, sizeof(int), 1, ptr_myfile);
fclose(ptr_myfile);
return 0;
}
This will yield:
$ hexdump -C test.txt
00000000 61 00 00 30 39 |a..09|
00000005
As you can see the byte order is now reversed.
This is one of the reasons why you might not want to write binary data directly to disk because of the differences in endianness.
A big-endian system will recognize 0x00003039
as 0x39300000
which would be 959447040
and not 1234
.
As others have mentioned, fwrite
does not write data in their string representation.
If you want that, you can use snprintf
(or use fprintf
) to convert your number to a string first, then write it to a file:
#include <stdio.h>
#include <string.h>
int main() {
FILE *ptr_myfile;
char c = 'a';
int numero = 12345;
ptr_myfile = fopen("test.txt","w");
if (!ptr_myfile) {
printf("Unable to open file!");
return 1;
}
// convert numero to a string
char numero_str[64];
// check result of snprintf, omitted for readability
snprintf(numero_str, sizeof(numero_str), "%d", numero);
fwrite(&c, sizeof(char), 1, ptr_myfile);
fwrite(numero_str, strlen(numero_str), 1, ptr_myfile);
fclose(ptr_myfile);
return 0;
}
$ cat test.txt
a12345
CodePudding user response:
When you use fwrite
the write function processes data as if it is binary of a certain length. This has nothing to do with the file opening mode you selected earlier.
Lets consider the following example:
/** A character buffer. */
char *ascii_buf = "ABCD";
/** A buffer which contains binary representation of A, B, C, D letters in ASCII. */
uint8_t binary_buf[4] = { 65, 66, 67, 68 };
written = fwrite(ascii_buf, 1, strlen(ascii_buf), fout);
written = fwrite(binary_buf, 1, sizeof(binary_buf), fout);
The above two calls to fwrite
result in the same output "ABCD"
into the target output file.
The only difference resids in the way the data is interpreted. In the first case ascii_buf
data is interpreted as character. While in the second case binary_buf
data is interpreted as unsigned integers. There content is the same, but their representation is different.
You will usually want to use:
fprintf
to output formatted strings to a file.fwrite
to output raw data to a file.