Set the leading zero bits in any size integer C-CodePudding

I want to set the leading zero bits in any size integer to 1 in standard C .

eg.

0001 0011 0101 1111 -> 1111 0011 0101 1111

All the algorithms I've found to do this require a rather expensive leading zero count. However, it's odd. There are very fast and easy ways to do other types of bit manipulation such as:

 int y = -x & x; //Extracts lowest set bit, 1110 0101 -> 0000 0001

 int y = (x   1) & x; //Will clear the trailing ones, 1110 0101 - > 1110 0100

 int y = (x - 1) | x; //Will set the trailing zeros, 0110 0100 - > 0110 0111

So that makes me think there must be a way to set the leading zeros of an integer in one simple line of code consisting of basic bit wise operators. Please tell me there's hope because right now I'm settling for reversing the order of the bits in my integer and then using the fast way of setting trailing zeros and then reversing the integer again to get my leading zeros set to ones. Which is actually significantly faster than using a leading zero count, however still quite slow compared with the other algorithms above.

 template<typename T>
 inline constexpr void reverse(T& x)
 {
    T rev = 0;
    size_t s = sizeof(T) * CHAR_BIT;

    while(s > 0)
    {
        rev = (rev << 1) | (x & 0x01);
        x >>= 1;
        s -= 1uz;
    }//End while

    x = rev;
 }

 
 template<typename T>
 inline constexpr void set_leading_zeros(T& x)
 {

     reverse(x);

     x = (x - 1) | x;//Set trailing 0s to 1s
     
     reverse(x);
 }

Edit

Because some asked: I'm working with MS-DOS running on CPUs ranging from early X86 to a 486DX installed in older CNC machines. Fun times. :D

CodePudding user response：

The leading zeroes can be set without counting them, while also avoiding reversing the integer. For convenience I won't do it for a generic integer type T, but likely it can be adapted, or you could use template specialization.

First calculate the mask of all the bits that aren't the leading zeroes, by "spreading" the bits downwards:

uint64_t m = x | (x >> 1);
m |= m >> 2;
m |= m >> 4;
m |= m >> 8;
m |= m >> 16;
m |= m >> 32;

Then set all the bits that that mask doesn't cover:

return x | ~m;

Bonus: this automatically works even when x = 0 and when x has all bits set, one of which in a count-leading-zero approach could lead to an overly large shift amount (which one depends on the details, but almost always one of them is troublesome, since there are 65 distinct cases but only 64 valid shift amounts, if we're talking about uint64_t).

CodePudding user response：

Your reverse is extremely slow! With an N-bit int you need N iterations to reverse, each at least 6 instructions, then at least 2 instructions to set the trailing bits, and finally N iterations to reverse the value again. OTOH even the simplest leading zero count needs only N iterations, then set the leading bits directly:

template<typename T>
inline constexpr T trivial_ilog2(T x) // Slow, don't use this
{
    if (x == 0) return 0;

    size_t c{};
    while(x)
    {
        x >>= 1;
        c  = 1u;
    }

    return c;
}

template<typename T>
inline constexpr T set_leading_zeros(T x)
{
    if (std::make_unsigned_t(x) >> (sizeof(T) * CHAR_BIT - 1)) // top bit is set
        return x;
    return x | (-T(1) << trivial_ilog2(x));
}

x = set_leading_zeros(x);

There are many other ways to count leading zero/get integer logarithm much faster. One of the methods involves doing in steps of powers of 2 like how to create the mask in harold's answer:

But since you're targeting a specific target instead of doing something cross-platform and want to squeeze every bit of performance, there are almost no reasons to use pure standard features since these usecases typically need platform-specific code. If intrinsics are available you should use it, for example in modern C there's std::countl_zero but each compiler already has intrinsics to do that which will map to the best instruction sequence for that platform, for example _BitScanReverse or __builtin_clz

If intrinsics aren't available of if the performance is still not enough then try a lookup table. For example here's a solution with 256-element log table

static const char LogTable256[256] = 
{
#define LT(n) n, n, n, n, n, n, n, n, n, n, n, n, n, n, n, n
    -1, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
    LT(4), LT(5), LT(5), LT(6), LT(6), LT(6), LT(6),
    LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7)
};

uint16_t lut_ilog2_16(uint16_t x)
{
    uint8_t h = x >> 8;
    if (h) return LogTable256[h]   8;
    else return LogTable256[x & 0xFF];
}

In set_leading_zeros just call lut_ilog2_16 like above

The even better solution than a log table is a mask table so that you can get the mask directly instead of calculating 1 << LogTable256[x]

static const char MaskTable256[256] =
{
    0xFF, 0xFE, 0xFC...
}

Some other notes:

1uz isn't a valid suffix in C
Don't use references for small types that fit in a single integer. That's not necessary and is usually slower when not inlined. Just assign the result back from the function

CodePudding user response：

You could count leading zeroes using std::countl_zero and create a bitmask that your bitwise OR with the original value:

#include <bit>
#include <climits>
#include <type_traits>

template<class T>
requires std::is_unsigned_v<T>
T leading_ones(T v) {
    auto lz = std::countl_zero(v);
    return lz ? v | ~T{} << (CHAR_BIT * sizeof v - lz) : v;
}

If you have a std::uint16_t, like

0b0001001101011111

then ~T{} is 0b1111111111111111, CHAR_BIT * sizeof v is 16 and countl_zero(v) is 3. Left shift 0b1111111111111111 16-3 steps:

0b1110000000000000

Bitwise OR with the original:

  0b0001001101011111
| 0b1110000000000000
--------------------
= 0b1111001101011111