I have a need to truncate a float to the nearest power of 10. For example, 1.1 would truncate to 1.0 and 4.7e3 would truncate to 1e3. I am currently doing it with the seemingly complicated powf(10,floorf(log10f(x)))
. I am wondering whether there is a better performing (as in faster execution speed) solution? My target CPU architecture is both x86-64 and arm64.
#include <stdio.h>
#include <math.h>
int main()
{
float x = 1.1e5f;
while (x > 1e-6f)
{
float y = powf(10,floorf(log10f(x)));
printf("%e ==> %g\n", x, y);
x /= 5.0f;
}
}
when run, this produces
1.100000e 05 ==> 100000
2.200000e 04 ==> 10000
4.400000e 03 ==> 1000
8.800000e 02 ==> 100
1.760000e 02 ==> 100
3.520000e 01 ==> 10
7.040000e 00 ==> 1
1.408000e 00 ==> 1
2.816000e-01 ==> 0.1
5.632000e-02 ==> 0.01
1.126400e-02 ==> 0.01
2.252800e-03 ==> 0.001
4.505600e-04 ==> 0.0001
9.011199e-05 ==> 1e-05
1.802240e-05 ==> 1e-05
3.604480e-06 ==> 1e-06
CodePudding user response:
It is possible to use a lookup table to speed up the computation. This technique should work for all normal floating point numbers. Subnormal numbers and NaN won't work without some dedicated logic, 0 and infinity can be handled by extreme values in the table.
Although I expect this technique to be actually faster than original implementation, measurements are needed.
The code uses C 20 std::bit_cast
to extract the exponent from the float
value. If not available, other older techniques like frexpf
exist.
#include <bit>
#include <cstdint>
#include <cstdio>
#include <limits>
constexpr float magnitudeLUT[] = {
0.f, 1e-38f, 1e-38f, 1e-38f, 1e-38f, 1e-37f, 1e-37f, 1e-37f, 1e-36f, 1e-36f, 1e-36f, 1e-35f,
1e-35f, 1e-35f, 1e-35f, 1e-34f, 1e-34f, 1e-34f, 1e-33f, 1e-33f, 1e-33f, 1e-32f, 1e-32f, 1e-32f,
1e-32f, 1e-31f, 1e-31f, 1e-31f, 1e-30f, 1e-30f, 1e-30f, 1e-29f, 1e-29f, 1e-29f, 1e-28f, 1e-28f,
1e-28f, 1e-28f, 1e-27f, 1e-27f, 1e-27f, 1e-26f, 1e-26f, 1e-26f, 1e-25f, 1e-25f, 1e-25f, 1e-25f,
1e-24f, 1e-24f, 1e-24f, 1e-23f, 1e-23f, 1e-23f, 1e-22f, 1e-22f, 1e-22f, 1e-22f, 1e-21f, 1e-21f,
1e-21f, 1e-20f, 1e-20f, 1e-20f, 1e-19f, 1e-19f, 1e-19f, 1e-19f, 1e-18f, 1e-18f, 1e-18f, 1e-17f,
1e-17f, 1e-17f, 1e-16f, 1e-16f, 1e-16f, 1e-16f, 1e-15f, 1e-15f, 1e-15f, 1e-14f, 1e-14f, 1e-14f,
1e-13f, 1e-13f, 1e-13f, 1e-13f, 1e-12f, 1e-12f, 1e-12f, 1e-11f, 1e-11f, 1e-11f, 1e-10f, 1e-10f,
1e-10f, 1e-10f, 1e-09f, 1e-09f, 1e-09f, 1e-08f, 1e-08f, 1e-08f, 1e-07f, 1e-07f, 1e-07f, 1e-07f,
1e-06f, 1e-06f, 1e-06f, 1e-05f, 1e-05f, 1e-05f, 1e-04f, 1e-04f, 1e-04f, 1e-04f, 1e-03f, 1e-03f,
1e-03f, 1e-02f, 1e-02f, 1e-02f, 1e-01f, 1e-01f, 1e-01f, 1e 00f, 1e 00f, 1e 00f, 1e 00f, 1e 01f,
1e 01f, 1e 01f, 1e 02f, 1e 02f, 1e 02f, 1e 03f, 1e 03f, 1e 03f, 1e 03f, 1e 04f, 1e 04f, 1e 04f,
1e 05f, 1e 05f, 1e 05f, 1e 06f, 1e 06f, 1e 06f, 1e 06f, 1e 07f, 1e 07f, 1e 07f, 1e 08f, 1e 08f,
1e 08f, 1e 09f, 1e 09f, 1e 09f, 1e 09f, 1e 10f, 1e 10f, 1e 10f, 1e 11f, 1e 11f, 1e 11f, 1e 12f,
1e 12f, 1e 12f, 1e 12f, 1e 13f, 1e 13f, 1e 13f, 1e 14f, 1e 14f, 1e 14f, 1e 15f, 1e 15f, 1e 15f,
1e 15f, 1e 16f, 1e 16f, 1e 16f, 1e 17f, 1e 17f, 1e 17f, 1e 18f, 1e 18f, 1e 18f, 1e 18f, 1e 19f,
1e 19f, 1e 19f, 1e 20f, 1e 20f, 1e 20f, 1e 21f, 1e 21f, 1e 21f, 1e 21f, 1e 22f, 1e 22f, 1e 22f,
1e 23f, 1e 23f, 1e 23f, 1e 24f, 1e 24f, 1e 24f, 1e 24f, 1e 25f, 1e 25f, 1e 25f, 1e 26f, 1e 26f,
1e 26f, 1e 27f, 1e 27f, 1e 27f, 1e 27f, 1e 28f, 1e 28f, 1e 28f, 1e 29f, 1e 29f, 1e 29f, 1e 30f,
1e 30f, 1e 30f, 1e 31f, 1e 31f, 1e 31f, 1e 31f, 1e 32f, 1e 32f, 1e 32f, 1e 33f, 1e 33f, 1e 33f,
1e 34f, 1e 34f, 1e 34f, 1e 34f, 1e 35f, 1e 35f, 1e 35f, 1e 36f, 1e 36f, 1e 36f, 1e 37f, 1e 37f,
1e 37f, 1e 37f, 1e 38f, 1e 38f, std::numeric_limits<float>::infinity() };
float decimalMagnitude(float val)
{
uint32_t intVal = std::bit_cast<uint32_t>(val);
uint8_t exponent = intVal >> 23;
if (val >= magnitudeLUT[exponent 1])
return magnitudeLUT[exponent 1];
else
return magnitudeLUT[exponent];
}
int main()
{
for (float v = 1e-38f; v < 1e38f; v *= 1.78)
printf("%e => %e\n", v, decimalMagnitude(v));
}
CodePudding user response:
I would say don't sweat it. Unless the program is spending a large proportion of its time doing this truncation, it's not worth optimising what is probably super-fast anyway. But if you wanted to optimise for your common cases (1e-2 <= x <= 10), then you might try using 32-bit integer arithmetic to compare with the binary representations of 1e-2, 1e-1, 1, and 10 (for instance, 1e-1 is 0x3dcccccd) ; if it's outside that range, you can fall back on the floating point version. Only experimentation will determine if this actually runs faster.