How to efficiently count leading zeros in a 24 bit unsigned integer?-CodePudding

Most of the clz() (SW impl.) are optimized for 32 bit unsigned integer.

How to efficiently count leading zeros in a 24 bit unsigned integer?

UPD. Target's characteristics:

CHAR_BIT                 24
sizeof(int)              1
sizeof(long int)         2
sizeof(long long int)    3

CodePudding user response：

Convert the 24 bit integer into a 32 bit one (either by type punning or explicitly shuffling around the bits), then to the 32 bit clz, and subtract 8.

Why do it that way? Because in this day and age you'll be hard pressed to find a machine that deals with 24 bit types, natively, in the first place.

CodePudding user response：

TL;DR: See point 4 below for the C program.

Assuming your hypothetical target machine is capable of correctly implementing unsigned 24-bit multiplication (which must return the low-order 24 bits of the product), you can use the same trick as is shown in the answer you link. It's worth trying to understand what's going on in that answer:

The input is reduced to a small set of values, where all integers with the same number of leading zeros map to the same value. The simple way of doing that is to flood every bit to cover all the bit positions to the right of it:

    x |= x>>1;
    x |= x>>2;
    x |= x>>4;
    x |= x>>8;
    x |= x>>16;

That will work for 17 up to 32 bits; if your target datatype has 9 to 16 bits, you could leave off the last shift-and-or because there is no bit position 16 bits to the right of any bit. And so on. But with 24 bits, you'll want all five shift-and-or.

With that, you've turned x into one of 25 values (for 24-bit ints):

       x clz         x clz         x clz         x clz         x clz
-------- ---  -------- ---  -------- ---  -------- ---  -------- ---
0x000000  24  0x00001f  19  0x0003ff  14  0x007fff   9  0x0fffff   4
0x000001  23  0x00003f  18  0x0007ff  13  0x00ffff   8  0x1fffff   3
0x000003  22  0x00007f  17  0x000fff  12  0x01ffff   7  0x3fffff   2
0x000007  21  0x0000ff  16  0x001fff  11  0x03ffff   6  0x7fffff   1
0x00000f  20  0x0001ff  15  0x003fff  10  0x07ffff   5  0xffffff   0

Now, to turn x into clz, we need a good hash function. We don't necessarily expect that hash(x)==clz, but we want the 25 possible x values to hash to different numbers, ideally in a small range. As with the link you provide, the hash function we'll choose is to multiply by a carefully-chosen multiplicand and then mask off a few bits. Using a mask means that we need to choose five bits; in theory, we could use a 5-bit mask anywhere in the 24-bit word, but in order to not have to think too much, I just chose the five high-order bits, the same as the 32-bit solution. Unlike the 32-bit solution, I didn't bother adding 1, and I expect to distinct values for all 25 possible inputs. The equivalent isn't possible with a five-bit mask and 33 possible clz values (as in the 32-bit case), so they have to jump through an additional hoop if the original input was 0.

Since the hash function doesn't directly produce the clz value, but rather a number between 0 and 31, we need to translate the result to a clz value, which uses a 32-byte lookup table, called debruijn in the 32-bit algorithm for reasons I'm not going to get into.
An interesting question is how to select a multiplier with the desired characteristics. One possibility would be to do a bunch of number theory to elegantly discover a solution. That's how it was done decades ago, but these days I can just write a quick-and-dirty Python program to do a brute force search over all the possible multipliers. After all, in the 24-bit case there are only about 16 million possibilities and lots of them work. The actual Python code I used is:
```
# Compute the 25 target values
targ=[2**i - 1 for i in range(25)]
# For each possible multiplier, compute all 25 hashes, and see if they
# are all different (that is, the set of results has size 25):
next(i for i in range(2**19, 2**24)
       if len(targ)==len(set(((i * t) >> 19) & 0x1f
                              for t in targ)))
```
Calling next on a generator expression returns the first generated value, which in this case is 0x8CB4F, or 576335. Since the search starts at 0x80000 (which is the smallest multiplier for which hash(1) is not 0), the result printed instantly. I then spent a few more milliseconds to generate all the possible multipliers between 2¹⁹ and 2²⁰, of which there are 90, and selected 0xCAE8F (831119) for purely personal aesthetic reasons. The last step is to create the lookup table from the computed hash function. (Not saying this is good Python. I just took it from my command history; I might come back and clean it up later. But I included it for completeness.):
```
lut = dict((i,-1) for i in range(32))
lut.update((((v * 0xcae8f) >> 19) & 0x1f, 24 - i)
           for i, v in enumerate(targ))
print("  static const char lut[] = {\n    "  
      ",\n    ".join(', '.join(f"{lut[i]:2}" for i in range(j, j 8))
                     for j in range(0, 32, 8))  
      "\n  };\n")
# The result is pasted into the C code below.
```

So then it's just a question of assembling the C code:

// Assumes that `unsigned int` has 24 value bits.
int clz(unsigned x) {
  static const char lut[] = {
    24, 23,  7, 18, 22,  6, -1,  9,
    -1, 17, 15, 21, 13,  5,  1, -1,
     8, 19, 10, -1, 16, 14,  2, 20,
    11, -1,  3, 12,  4, -1,  0, -1
  };
  x |= x>>1;
  x |= x>>2;
  x |= x>>4;
  x |= x>>8;
  x |= x>>16;
  return lut[((x * 0xcae8f) >> 19) & 0x1f];
}

The test code calls clz on every 24-bit integer in turn. Since I don't have a 24-bit machine handy, I just assume that the arithmetic will work the same on the hypothetical 24-bit machine in the OP.

#include <stdio.h>

# For each 24-bit integer in turn (from 0 to 2**24-1), if
# clz(i) is different from clz(i-1), print clz(i) and i.
#
# Expected output is 0 and the powers of 2 up to 2**23, with
# descending clz values from 24 to 0.
int main(void) {
  int prev = -1;
  for (unsigned i = 0; i < 1<<24;   i) {
    int pfxlen = clz(i);
    if (pfxlen != prev) {
      printf("- 0xX\n", pfxlen, i);
      prev = pfxlen;
    }
  }
  return 0;
}

CodePudding user response：

I would look for the builtin function or intrinsic available for your platform and compiler. Those functions usually implement the most efficient way of finding the most significant bit number. For example, gcc has __builtin_clz function.

If the 24 bit integer is stored in a byte array (for example received from sensor)

#define BITS(x)  (CHAR_BIT * sizeof(x) - 24)
int unaligned24clz(const void * restrict val)
{
    unsigned u = 0;
    memcpy(&u, val, 3);

    #if defined(__GNUC__)
    return __builtin_clz(u) - BITS(u);
    #elif defined(__ICCARM__)
    return __CLZ(u) - BITS(u);
    #elif defined(__arm__)
    return __clz(u) - BITS(u);
    #else 
    return clz(u) - BITS(u); //portable version using standard C features
    #endif
}

If it is stored in valid integer

int clz24(const unsigned u)
{
    #if defined(__GNUC__)
    return __builtin_clz(u) - BITS(u);
    #elif defined(__ICCARM__)
    return __CLZ(u) - BITS(u);
    #elif defined(__arm__)
    return __clz(u) - BITS(u);
    #else 
    return clz(u) - BITS(u); //portable version using standard C features
    #endif
}

https://godbolt.org/z/z6n1rKjba

You can add more compilers support if you need.

Remember if the value is 0 the value of the __builtin_clz is undefined so you will need to add another check.