Home > Net >  Problems reading number given endianness (C to Javascript)
Problems reading number given endianness (C to Javascript)

Time:09-13

I am coding a parser (Writing NodeJS code) where the C spec defines LittleEndian system for the byte order, where the most significant bit is on the right end.

At some point in the description, they say this:

Most significant bits (MSB) is on the left, the information is encoded in a 32-bit unsigned integer as follows 12bit (year) | 4 bit (month)...

So I see logical to parse the bit this way (pseudo code)

const number = readUint32BE(data) //unsigned integer 32 bits
const year = (number >> 20)

Does that make sense logically ?

However I get the wrong numbers, and I get the right numbers (reasonable years) If I do:

const number = readUint32LE(data) //unsigned integer 32 bits
const year = (number >> 20)

Any help to understand what am I thinking wrong please?

CodePudding user response:

If you are reading data in from the outside, byte order truly matters.
Examples:

/* read two-byte integer, little-endian: */
unsigned short i1 = getc(ifp); x |= getc(ifp) << 8;   

/* read two-byte integer, big-endian: */
unsigned short i2 = getc(ifp); y = (y << 8) | getc(ifp);

If you are using a byte pointer to access the bytes of an integer in memory, byte order truly matters. Example:

uint32_t x = 0x04030201;
unsigned char *p = &x;
printf("x\n", *p);

On a little-endian machine, this prints 01. On a big-endian machine, this prints 04. On a little-endian machine, the pointer &x literally points at the "little end" of the 4-byte integer 0x04030201.

But when you're doing arithmetic, and when you're doing bitwise operations, on values that are multibyte quantities, byte order does not matter. These operations all take place on the full value, not on the individual bytes of the value. Examples:

uint32_t x = 0x04030201;
unsigned char msbyte = x >> 24;      /* always most-significant byte */
unsigned char lsbyte = x & 0xff;     /* always least-significant byte */
unsigned char lsbit = x & 0x01;      /* always least-significant bit */

In the original question, given that "the C spec defines LittleEndian", the pseudocode

const number = readUint32LE(data)

would be correct, and the alternative

const number = readUint32BE(data)     // WRONG

would be incorrect. However, once the number has been read in correctly, it's now a proper value, and byte order considerations no longer apply. The stipulation "Most significant bit is on the left" is unnecessary and somewhat misleading. Code like

const year = (number >> 20)

to extract the 12 most-significant bits is perfectly correct.

  • Related