Home > Net >  Access Binary form of Text saved in memory using an Array
Access Binary form of Text saved in memory using an Array

Time:07-16

The storage representation of the string or equivalently text from a file, is the ASCII code for each character of the string or text from a file, I have been told that I/O functions like fread and fgets will read a string from disk into memory without conversion. The C compiler always works with the storage representation, so when we "retrieve" a string in C, it's always in binary form.

I need to access this binary form to use in my code (without saving this as a binary file, also not asking to print in binary format).

For example, the text string "AA" is saved in memory as "0100000101000001", I need to access directly, without any conversion (like we do when we print, integer using %s, %d) this binary form "0100000101000001" of "AA" using an integer array, say, D[16] which has elements 0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,1. So if I use an index int i, I will get 0 from D[4] for i=0.

Array-index operations like buffer[i] (for example, in the sample code in the below) will extract one character from a string:

FILE *fp = fopen("a.txt", "r");
if (fp == NULL)
    return 1;

char buffer[100];
int r = fread(buf, 1, sizeof(buffer), fp);
if (r <= 0)
    return 1;

printf("As string: %.*s", r, buffer);

printf("As integers:");
for (i = 0; i < r; i  )
    printf(" %d", buffer[i]);

But I would like to have the complete text as an array of 0 and 1, whereas here, buffer[i] contains 8 bits which I cannot access individually each bit, how can I do that?

CodePudding user response:

I have been told that I/O functions like fread and fgets will read a string from disk into memory without conversion.

This is true if the file has been open as binary, ie: with "rb". Such streams do not undergo any translation when read into memory, and all stream functions will read the contents as it is stored on disk, getc() included. If your system is unix based, there is no difference with "r", but on legacy systems, there can be substantial differences: text mode, which is the default, may imply end of line conversion, code page translation, end of file mitigation... If you want the actual file contents, always use binary mode ("rb").

You should also avoid the char type when dealing with binary representation, because char is signed by default on many architectures, hence inappropriate for byte values which are usually considered positive. Use unsigned char to prevent this issue.(*)

The most common way to display binary contents is using hexadecimal representation, where each byte is output as exactly 2 hex digits.

If you want to output binary representation, there is no standard printf conversion to output base-2 numbers, but you can write a loop to convert the byte to its bit values.


(*) among other historical issues such as non two's complement signed value representations

Here is a modified version:

#include <stdio.h>

int main() {
    FILE *fp = fopen("a.txt", "r");
    if (fp == NULL) {
        perror("a.txt");
        return 1;
    }
    unsigned char buffer[100];
    unsigned char bits[100 * 8];
    int r = fread(buffer, 1, sizeof(buffer), fp);
    if (r <= 0) {
        fprintf(stderr, "empty file\n");
        fclose(fp);
        return 1;
    }
    
    printf("As a string: %.*s\n\n", r, (char *)buffer);
    int pos;
    pos = printf("As 8-bit integers:");
    for (int i = 0; i < r; i  ) {
        if (pos > 72) {
            printf("\n");
            pos = 0;
        }
        pos  = printf(" %d", buffer[i]);
    }
    printf("\n\n");
    
    pos = printf("As hex bytes:");
    for (int i = 0; i < r; i  ) {
        if (pos > 72) {
            printf("\n");
            pos = 0;
        }
        pos  = printf(" X", buffer[i]);
    }
    printf("\n\n");
    
    pos = printf("Converting to a bit array:");
    for (int i = 0; i < r; i  ) {
        for (int j = 8; j-- > 0;) {
            bits[i * 8   7 - j] = (buffer[i] >> j) & 1;
        }
    }
    /* output the bit array */
    for (int i = 0; i < r * 8; i  ) {
        if (pos > 72) {
            printf("\n    ");
            pos = 4;
        }
        pos  = printf("%d", bits[i]);
    }
    printf("\n");
    fclose(fp);
    return 0;
}

CodePudding user response:

Use bit masking to check the value of individual bits. Checkout a brief description here https://www.learn-c.org/en/Bitmasks

Then you can write the result to your array for the corresponding bit.

  • Related