I have two integer values:
d_a = 6
and d_b = 3
, so-called distance between set bits.
Masks created with appropriate distance look like below:
uint64_t a = 0x1041041041041041; // 0001 0000 0100 0001 0000 0100 0001 0000
// 0100 0001 0000 0100 0001 0000 0100 0001
uint64_t b = 0x9249249249249249; // 1001 0010 0100 1001 0010 0100 1001 0010
// 0100 1001 0010 0100 1001 0010 0100 1001
The goal is to have a target
mask, which has bits set with d_b
, but simultaneously takes into account bits set in the a
mask (e.g. first set bit is shifted).
The second thing is that the distance in the target
mask is not constant i.e. number of zeros between set bits in the target
mask shall be equal to d_b
or increased whenever between them is set bit in a
uint64_t target = 0x4488912224488912; // 0100 0100 1000 1000 1001 0001 0010 0010
// 0010 0100 0100 1000 1000 1001 0001 0010
The picture to visualize the problem:
The blue bar is a
, yellow is b
.
I would rather use bit manipulation intrinsics than bit-by-bit operations.
edit: Actually, I have the following code, but I am looking for a solution with a fewer number of instructions.
void set_target_mask(int32_t d_a, int32_t d_b, int32_t n_bits_to_set, uint8_t* target)
{
constexpr int32_t n_bit_byte = std::numeric_limits<uint8_t>::digits;
int32_t null_cnt = -1;
int32_t n_set_bit = 0;
int32_t pos = 0;
while(n_set_bit != n_bits_to_set)
{
int32_t byte_idx = pos / n_bit_byte;
int32_t bit_idx = pos % n_bit_byte;
if(pos % d_a == 0)
{
pos ;
continue;
}
null_cnt ;
if(null_cnt % d_b == 0)
{
target[byte_idx] |= 1 << bit_idx;
n_set_bit ;
}
pos ;
}
}
CodePudding user response:
If target is uint64_t
, possible d_a
and d_b
can be converted into bit masks via look-up table. Like lut[6] == 0x2604D5C99A01041
from your question.
Look up tables can be initialized once per program run during initlalization, or in compile time using macro or constant expressions (constexpr
).
To make d_b
spread, skipping d_a
bits, you can use pdep
with inverted d_a
:
uint64_t tmp = _pdep_u64(d_b_bits, ~d_a_bits);
Then you can convert n_bits_to_set
to contiguous bits mask:
uint64_t n_bits = (1 << n_bits_to_set) - 1;
And spread them using pdep
again:
uint64_t tmp = _pdep_u64(n_bits, tmp);
(See Intrinsic Guide about pdep. Note that pdep is slow on AMD before Zen3. It's fast on Intel CPUs and Zen3, but not Bulldozer-family or Zen1/Zen2)