Unpacking 4 bit chunks in Java int optimization-CodePudding

In order to unpack first and second 4 bit chunks from int, I use this approach:

int chunks = 0b1111_1110_1101_1100_1011_1010_1001_1000;
int mask = 0x0f << (Integer.SIZE - 4);
byte result = (byte) ((chunks & mask) >>> (Integer.SIZE - 4));
Integer.toBinaryString(result); //print "1111"

int chunks = 0b1111_1110_1101_1100_1011_1010_1001_1000;
int mask = 0x0f << (Integer.SIZE - 8);
byte result = (byte) ((chunks & mask) >>> (Integer.SIZE - 8));
Integer.toBinaryString(result); //print "1110"

It works great when the int chunks number bit's representation starts from 1 bit.

When I have such number starting from 0000:

int chunks = 0b0000_0110_1101_1100_1011_1010_1001_1000;
int mask = 0x0f << (Integer.SIZE - 4);
byte result = (byte) ((chunks & mask) >>> (Integer.SIZE - 4));
Integer.toBinaryString(result); //print "0"

int chunks = 0b0000_0110_1101_1100_1011_1010_1001_1000;
int mask = 0x0f << (Integer.SIZE - 8);
byte result = (byte) ((chunks & mask) >>> (Integer.SIZE - 8));
Integer.toBinaryString(result); //print "110"

It works great as well.

Is it a best approach from a performance perspective? I feel like I overcomplicated it.

CodePudding user response：

These operations are so trivial, that it is unlikely to have an impact on the performance.

But if you want to dive into it

Arithmetic operations and local variables are using int anyway, when using byte, short, char, or int. So unless you’re going to actually store the value into a heap variable of type byte, there is no advantage in declaring a local variable as byte. The associated type cast implies a sign extension operation of the lowest eight bits to 32 bits, unless the JVM’s optimizer manages to eliminate it.
There are no additional bits beyond the most significant bits. So when you shift the most significant bits into the lowest position, there is no need to mask them.
For the second “4 bit chunk” you still need a mask, but you don’t need to shift it into the high position. Instead, you can first shift your bits down, then mask them using 0xf. Since the mask is a constant in either case, there is no impact on the performance, at least when the JIT compiler did its work, however, the bytecode will be smaller.

So, when we use

public static void main(String[] args) {
  int[] allChunks = {
    0b1111_1110_1101_1100_1011_1010_1001_1000,
    0b0000_0110_1101_1100_1011_1010_1001_1000
  };
  for(int chunks: allChunks) {
    if(firstFourBitChunkOld(chunks) != firstFourBitChunk(chunks))
      throw new AssertionError();
    if(secondFourBitChunkOld(chunks) != secondFourBitChunk(chunks))
      throw new AssertionError();

    System.out.println(Integer.toBinaryString(firstFourBitChunk(chunks)));
    System.out.println(Integer.toBinaryString(secondFourBitChunk(chunks)));
    System.out.println();
  }
}

static int firstFourBitChunk(int chunks) {
  return chunks >>> 28;
}

static int secondFourBitChunk(int chunks) {
  return chunks >>> 24 & 0xf;
}

private static final int MASK_FIRST_FOUR_BITS = 0x0f << (Integer.SIZE - 4);
static byte firstFourBitChunkOld(int chunks) {
  return (byte) ((chunks & MASK_FIRST_FOUR_BITS) >>> (Integer.SIZE - 4));
}

private static final int MASK_SECOND_FOUR_BITS = 0x0f << (Integer.SIZE - 8);
static byte secondFourBitChunkOld(int chunks) {
  return (byte) ((chunks & MASK_SECOND_FOUR_BITS) >>> (Integer.SIZE - 8));
}

It’s debatable whether, e.g. Integer.SIZE - 4 is more readable than 28. I had to look it up whether Integer.SIZE means “size in bits” or “size in bytes” or something entirely different. The name doesn’t tell. I think, generally, developers should know that Java’s int has 32 bits, before going to perform bit manipulations. But since the expression Integer.SIZE - 4 is a constant, this choice has no impact on the compiled code at all.

The code above will run successfully und we can also compare the resulting bytecode:

  static int firstFourBitChunk(int);
       0: iload_0
       1: bipush        28
       3: iushr
       4: ireturn

  static int secondFourBitChunk(int);
       0: iload_0
       1: bipush        24
       3: iushr
       4: bipush        15
       6: iand
       7: ireturn

  static byte firstFourBitChunkOld(int);
       0: iload_0
       1: ldc           #8                  // int -268435456
       3: iand
       4: bipush        28
       6: iushr
       7: i2b
       8: ireturn

  static byte secondFourBitChunkOld(int);
       0: iload_0
       1: ldc           #10                 // int 251658240
       3: iand
       4: bipush        24
       6: iushr
       7: i2b
       8: ireturn

The i2b is the additional sign extension operation. For loading the shifted mask, an ldc instruction is needed, which loads a value from the constant pool. The constant itself will take another five bytes in the constant pool in this case. Besides that, the codes are equivalent.

As said, it will likely have no practical impact on the actual performance in a normal, optimizing execution environment. However, I consider the shorter variants also more readable, at least for developers with the necessary understanding of the bit arithmetic. There’s no way to make it readable for an audience without that understanding anyway.