I need to map int values 1,2,3 to chars 'C', 'M', 'A'
Whats the fastest way (this will be called 100s times per sec 24/7)?
a macro or an inline function and bunch of ?: operators or ifs or switch? or an array?
CodePudding user response:
A lookup-table seems the most obvious approach as it is also branch-free:
inline constexpr char map(std::size_t i) {
constexpr char table[] = "0CMA";
// if in doubt add bounds checking for i, but it will cost performance
return table[i];
}
Observe that with optimisation, the lookup table boils down to an integer constant.
Edit: You can shave of an additional instruction if you are less lazy than me and specify the lookup table thusly:
constexpr char table[] = {0, 'M', 'C', 'A'};
CodePudding user response:
My proposal: (char)(0x414D4300 >> (i * 8))
Instead of 0x414D4300 you could write (('C' << 8) | ('M' << 16) | ('A' << 24))
.
Only actual testing will tell you whether this is faster than bitmask's answer.
If this runs only 100 times per second, you're chasing a red herring, anyway. Even if you write this section in the dumbest way possible, it seems to be a million times faster than the rest of your program.
CodePudding user response:
You're optimizing your code prematurely. Let me prove it. Here's the naivest, slowest approach that is still reasonable:
#include <map>
char map(int i) {
std::map<int, char> table;
table[1] = 'C';
table[2] = 'M';
table[3] = 'A';
return table[i];
}
It generates hundreds of lines of assembly instructions.
And let's time it for a million elements:
#include <iostream>
#include <map>
#include <chrono>
#include <cstdlib>
#include <vector>
char map(int i) {
std::map<int, char> table;
table[1] = 'C';
table[2] = 'M';
table[3] = 'A';
return table[i];
}
int main() {
int const count = 1'000'000;
std::vector<int> elms(count);
std::srand(0); // I know that rand is no good, but it's OK for this example
for (int i = 0; i < count; i) {
elms[i] = std::rand() % 3 1;
}
auto beg = std::chrono::high_resolution_clock::now();
volatile int do_not_optimize;
for (int i = 0; i < count; i) {
do_not_optimize = map(elms[i]);
}
auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - beg);
std::cout << "Duration: " << duration.count() << " ms" << std::endl;
}
Here's the output on my computer with lots of other programs running alongside:
Duration: 246 ms
This terrible implementation under terrible conditions still meets your needs 40'650x over. I bet you have way more to worry about elsewhere in your codebase. Remember to make it right, then make it fast only if it's not already fast enough.