Home > Software design >  How can bytes in char array represent integers?
How can bytes in char array represent integers?

Time:05-04

So let's say I have char array that I read from binary file (like ext2 formatted filesystem image file).

Now I need to read integer starting at offset byte 1024(<--that's the offset from start of data). Is there any neat way of doing it. The integer could be any number. So I believe can be represented in integer size of 4 byte on my system (x86-64).

I believe I need to use strtol like:

/* Convert the provided value to a decimal long */
char *eptr=malloc(4);// 4 bytes becuase sizeof int is 4 bytes
....
int valread=read(fd,eptr,4);//fd is to ext2 formatted image file (from file system)
result = strtol(eptr, &v, 10);

The above is long so is this the number to represent a integer 32 bit?

Should eptr be null terminated?

Is this correct or not?

CodePudding user response:

I have char array that I read from binary file (like ext2 formatted filesystem image file).

Open the file in binary mode

const char *file_name = ...;
FILE *infile = fopen(file_name, "rb");  // b is for binary
if (infile == NULL) {
  fprintf(stderr, "Unable to open file <%s>.\n", file_name);
  exit(1);
}

I need to read integer starting at offset byte 1024 ...

long offset = 1024; 
if (fseek(infile, offset, SEEK_SET)) {
  fprintf(stderr, "Unable to seek to %ld.\n", offset);
  exit(1);
} 

So I believe can be represented in integer size of 4 byte on my system

Rather than use int, which may differ from 4-bytes, consider int32_t from <stdint.h>.

int32_t data4;
if (fread(&data4, sizeof data4, 1, infile) != 1) {
  fprintf(stderr, "Unable to read data.\n");
  exit(1);
} 

Account for Endian.

As file data is little-endian, convert to native endian. See #include <endian.h>.

data4 = le32toh(data4);

Clean up when done

// Use data4

fclose(infile);

believe I need to use strtol like

No. strtol() examines a string and returns a long. File data is binary and not a string.

CodePudding user response:

In the case of strtol it might be easier to follow along by seeing some code. So here a very simplified strtol kind of function:

int string_to_int(const char *string)
{
    // The integer value we construct and return
    int value = 0;

    // Loop over all the characters in the string, one by one,
    // until the string null-terminator is reached
    for (unsigned i = 0; string[i] != '\0';   i)
    {
        // Get the current character
        char c = string[i];

        // Convert the digit character to its corresponding numeric value
        int c_value = c - '0';

        // Add the characters numeric value to the current value
        value = (value * 10)   c_value;

        // Note the multiplication with 10: That's because decimal numbers are base 10
    }

    // Now the string have been converted to its decimal integer value, return it
    return value;
}

If we call it with the string "123" and unroll the loop what's happening is this:

// First iteration
char c = string[0];  // c = '1'
int c_value = c - '0';  // c_value = 1
value = (value * 10)   c_value;  // value = (0 * 10)   1 = 0   1 = 1

// Second iteration
char c = string[0];  // c = '2'
int c_value = c - '0';  // c_value = 2
value = (value * 10)   c_value;  // value = (1 * 10)   2 = 10   2 = 12

// Third iteration
char c = string[0];  // c = '3'
int c_value = c - '0';  // c_value = 3
value = (value * 10)   c_value;  // value = (12 * 10)   3 = 120   3 = 123

In the fourth iteration we reach the string null-terminator and the loop ends with value being equal to the int value 123.

I hope this makes it a little clearer about how string to number conversions are working.


While the above is for strings, if you read the raw binary bits of an existing int value, then you should not call strtol because the data isn't a string.

Instead you basically interpret the four bytes as a single 32-bit value.

Unfortunately it's not easy to explain how these bits are interpreted without knowing a thing or two about endianness.

Endianness is how the bytes are ordered to make up the integer value. Taking the (hexadecimal) number 0x01020304 they can be stored either as 0x01, 0x02, 0x03 and 0x04 (this is called big-endian); Or as 0x04, 0x03, 0x02 and 0x01 (this is called little-endian).

On a little-endian system (your normal PC-like system) say you have an array like this:

char bytes[4] = { 0x04, 0x03, 0x02, 0x01 };

then you could copy it into an int:

int value;
memcpy(&value, bytes, 4);

and that will make the int variable value equal to 0x01020304.

  •  Tags:  
  • c io
  • Related