TL;DR:
How can I quickly find the smallest integer that satisfies int > floor && ((int ^ bits) & mask) == 0
, where floor
, bits
and mask
are known? (Ideally a non-looping, non-branching solution, probably with some serious bit-twiddling involved.)
Background
Suppose I have a list of key-values sorted by key, where each key is an integer. I want to quickly gather each value whose key has certain flags set or clear, i.e. matches the filter (key & mask) == (bits & mask)
or equivalently ((key ^ bits) & mask) == 0
.
In a naïve implementation, I could iterate over every key-value and gather those matching the filter. However, if my matches are sparse, I'd like to take advantage of my list's sortedness to quickly find new matches.
So my current approach is an algorithm that switches back and forth between binary-searching and scanning the list, maintaining the smallest possible future key based on the last key read in. If a search for the key succeeds, it scans until the the read-in key no longer matches the filter. Either way, the search begins anew with a newly computed "smallest possible future key" based on the last key we read in (aka floor
, which doesn't match the filter.)
My testing has shown the algorithm to be correct (thankfully), but I'm using a placeholder iterative method to compute the "next" key instead of a bit-twiddling method like I was hoping to have come up with... which brings us back to the TL;DR!
CodePudding user response:
Here is one approach, though it does not quite meet the strictest version of your criteria. It's not too bad though, a few branches is all.
The condition ((key ^ bits) & mask) == 0
(henceforth "The Condition") requires some bits to take on particular values, I will call those bits the "fixed bits" and other bits the "non-fixed bits". WLOG I will assume that (bits & mask) == bits
.
Consider floor
. How would it have to change to fit The Condition? Some bits may have to be set to match the fixed bits, some bits may have to be reset to match the fixed bits. Sometimes no bits would have to be changed. Let's consider some different cases:
floor
already satisfies The Condition. Butfloor
isn't larger thanfloor
, so we have to increment it, but without violating The Condition. Incrementing a number of which some bits stay fixed has a known solution:(floor | mask) 1 & ~mask | bits
, which could be redundantly parenthesized to(((floor | mask) 1) & ~mask) | bits
- The highest "wrong" bit in
floor
, at indexi
, needs to be set. In that case, setting that bit would enable the lower non-fixed bits to become zero without going belowfloor
. So we reset all the bits belowi
, and then bitwise-OR withbits
to fill in the fixed bits. - The highest "wrong" bit in
floor
, at indexi
, needs to be reset. Resetting it would decrease the value belowfloor
, so the non-fixed bits abovei
need to be incremented. We can use a variant of the trick seen in the first case, but instead of adding1
, we add1 << i
. Then again we reset all the bits below that and OR withbits
.
The conditional addition of 1 << i
if bit i
of x
is set, could be implemented as adding x & (1 << i)
.
There are different ways to actually implement that, for example (somewhat tested):
uint32_t next_above_with_pattern(uint32_t floor, uint32_t mask, uint32_t bits)
{
uint32_t x = floor;
if (((x ^ bits) & mask) == 0)
{
x = (x | mask) 1 & ~mask | bits;
}
else
{
int highestIndexOfViolation = 31 - (int)_lzcnt_u32((x ^ bits) & mask);
x = (x | mask) (x & (1UL << highestIndexOfViolation)) & ~mask | bits;
x &= ~((1UL << highestIndexOfViolation) - 1);
x |= bits;
}
return x;
}
With some changes as suggested by Ben
uint32_t next_above_with_patternD(uint32_t floor, uint32_t mask, uint32_t bits)
{
uint32_t x = floor;
if (((x ^ bits) & mask) == 0)
{
x = (((x | mask) 1) & ~mask) | bits;
}
else
{
uint32_t maskOfHighestViolation = 0x80000000U >> _lzcnt_u32((x ^ bits) & mask);
x = (((x | mask) (x & maskOfHighestViolation)) & ~mask) | bits;
x &= ~(maskOfHighestViolation - 1); // could be -maskOfHighestViolation
x |= bits;
}
return x;
}
CodePudding user response:
If you had a bit swizzling operation, this would be really easy:
Starting with
floor
, swizzle away the bits controlled by mask leaving only the variable bits. Let this bev(floor)
Compute
1 v(floor)
For the results of both step 1 and 2, swizzle the mask-controlled bits back in. Compare to
floor
, picking the lower feasible value.
However, bit interleaving ("swizzling") is not one of the fundamental operations on a CPU (With an FPGA it is trivial)