Most of the clz()
(SW impl.) are optimized for 32 bit unsigned integer.
How to efficiently count leading zeros in a 24 bit unsigned integer?
UPD. Target's characteristics:
CHAR_BIT 24
sizeof(int) 1
sizeof(long int) 2
sizeof(long long int) 3
CodePudding user response:
Convert the 24 bit integer into a 32 bit one (either by type punning or explicitly shuffling around the bits), then to the 32 bit clz, and subtract 8.
Why do it that way? Because in this day and age you'll be hard pressed to find a machine that deals with 24 bit types, natively, in the first place.
CodePudding user response:
TL;DR: See point 4 below for the C program.
Assuming your hypothetical target machine is capable of correctly implementing unsigned 24-bit multiplication (which must return the low-order 24 bits of the product), you can use the same trick as is shown in the answer you link. It's worth trying to understand what's going on in that answer:
The input is reduced to a small set of values, where all integers with the same number of leading zeros map to the same value. The simple way of doing that is to flood every bit to cover all the bit positions to the right of it:
x |= x>>1; x |= x>>2; x |= x>>4; x |= x>>8; x |= x>>16;
That will work for 17 up to 32 bits; if your target datatype has 9 to 16 bits, you could leave off the last shift-and-or because there is no bit position 16 bits to the right of any bit. And so on. But with 24 bits, you'll want all five shift-and-or.
With that, you've turned x into one of 25 values (for 24-bit ints):
x clz x clz x clz x clz x clz -------- --- -------- --- -------- --- -------- --- -------- --- 0x000000 24 0x00001f 19 0x0003ff 14 0x007fff 9 0x0fffff 4 0x000001 23 0x00003f 18 0x0007ff 13 0x00ffff 8 0x1fffff 3 0x000003 22 0x00007f 17 0x000fff 12 0x01ffff 7 0x3fffff 2 0x000007 21 0x0000ff 16 0x001fff 11 0x03ffff 6 0x7fffff 1 0x00000f 20 0x0001ff 15 0x003fff 10 0x07ffff 5 0xffffff 0
Now, to turn x into clz, we need a good hash function. We don't necessarily expect that hash(x)==clz, but we want the 25 possible x values to hash to different numbers, ideally in a small range. As with the link you provide, the hash function we'll choose is to multiply by a carefully-chosen multiplicand and then mask off a few bits. Using a mask means that we need to choose five bits; in theory, we could use a 5-bit mask anywhere in the 24-bit word, but in order to not have to think too much, I just chose the five high-order bits, the same as the 32-bit solution. Unlike the 32-bit solution, I didn't bother adding 1, and I expect to distinct values for all 25 possible inputs. The equivalent isn't possible with a five-bit mask and 33 possible clz values (as in the 32-bit case), so they have to jump through an additional hoop if the original input was 0.
Since the hash function doesn't directly produce the clz value, but rather a number between 0 and 31, we need to translate the result to a clz value, which uses a 32-byte lookup table, called
debruijn
in the 32-bit algorithm for reasons I'm not going to get into.An interesting question is how to select a multiplier with the desired characteristics. One possibility would be to do a bunch of number theory to elegantly discover a solution. That's how it was done decades ago, but these days I can just write a quick-and-dirty Python program to do a brute force search over all the possible multipliers. After all, in the 24-bit case there are only about 16 million possibilities and lots of them work. The actual Python code I used is:
# Compute the 25 target values targ=[2**i - 1 for i in range(25)] # For each possible multiplier, compute all 25 hashes, and see if they # are all different (that is, the set of results has size 25): next(i for i in range(2**19, 2**24) if len(targ)==len(set(((i * t) >> 19) & 0x1f for t in targ)))
Calling
next
on a generator expression returns the first generated value, which in this case is 0x8CB4F, or 576335. Since the search starts at 0x80000 (which is the smallest multiplier for which hash(1) is not 0), the result printed instantly. I then spent a few more milliseconds to generate all the possible multipliers between 219 and 220, of which there are 90, and selected 0xCAE8F (831119) for purely personal aesthetic reasons. The last step is to create the lookup table from the computed hash function. (Not saying this is good Python. I just took it from my command history; I might come back and clean it up later. But I included it for completeness.):lut = dict((i,-1) for i in range(32)) lut.update((((v * 0xcae8f) >> 19) & 0x1f, 24 - i) for i, v in enumerate(targ)) print(" static const char lut[] = {\n " ",\n ".join(', '.join(f"{lut[i]:2}" for i in range(j, j 8)) for j in range(0, 32, 8)) "\n };\n") # The result is pasted into the C code below.
So then it's just a question of assembling the C code:
// Assumes that `unsigned int` has 24 value bits. int clz(unsigned x) { static const char lut[] = { 24, 23, 7, 18, 22, 6, -1, 9, -1, 17, 15, 21, 13, 5, 1, -1, 8, 19, 10, -1, 16, 14, 2, 20, 11, -1, 3, 12, 4, -1, 0, -1 }; x |= x>>1; x |= x>>2; x |= x>>4; x |= x>>8; x |= x>>16; return lut[((x * 0xcae8f) >> 19) & 0x1f]; }
The test code calls
clz
on every 24-bit integer in turn. Since I don't have a 24-bit machine handy, I just assume that the arithmetic will work the same on the hypothetical 24-bit machine in the OP.#include <stdio.h> # For each 24-bit integer in turn (from 0 to 2**24-1), if # clz(i) is different from clz(i-1), print clz(i) and i. # # Expected output is 0 and the powers of 2 up to 2**23, with # descending clz values from 24 to 0. int main(void) { int prev = -1; for (unsigned i = 0; i < 1<<24; i) { int pfxlen = clz(i); if (pfxlen != prev) { printf("- 0xX\n", pfxlen, i); prev = pfxlen; } } return 0; }
CodePudding user response:
I would look for the builtin function or intrinsic available for your platform and compiler. Those functions usually implement the most efficient way of finding the most significant bit number. For example, gcc has __builtin_clz function.
If the 24 bit integer is stored in a byte array (for example received from sensor)
#define BITS(x) (CHAR_BIT * sizeof(x) - 24)
int unaligned24clz(const void * restrict val)
{
unsigned u = 0;
memcpy(&u, val, 3);
#if defined(__GNUC__)
return __builtin_clz(u) - BITS(u);
#elif defined(__ICCARM__)
return __CLZ(u) - BITS(u);
#elif defined(__arm__)
return __clz(u) - BITS(u);
#else
return clz(u) - BITS(u); //portable version using standard C features
#endif
}
If it is stored in valid integer
int clz24(const unsigned u)
{
#if defined(__GNUC__)
return __builtin_clz(u) - BITS(u);
#elif defined(__ICCARM__)
return __CLZ(u) - BITS(u);
#elif defined(__arm__)
return __clz(u) - BITS(u);
#else
return clz(u) - BITS(u); //portable version using standard C features
#endif
}
https://godbolt.org/z/z6n1rKjba
You can add more compilers support if you need.
Remember if the value is 0
the value of the __builtin_clz
is undefined so you will need to add another check.