tldr; How do I implement a for loop that runs a timed function with std::index_sequence
?
Okay, I'll admit that title is a little cryptic but I was looking at this question: is that possible to have a for loop in compile time with runtime or even compile?
And I may have gotten too excited with what I could possibly do with std::index_sequence
. I'll explain what my goal is. I want something like the following code:
for(int i = 1; i < 100000; i)
{
auto start = time();
runOpenCL<i>();
std::cout << time() - start << std::endl;
}
to compile to this (with the timers for each one):
runOpenCL<1>();
runOpenCL<2>();
runOpenCL<3>();
...
runOpenCL<100000>();
Now I thought this should just work right? Since for loops are often interpreted at compile time (if that's the right phrase) in this way. However, I understand templates have certain safeguards against this possible dodgy code so I saw that std::index_sequence could get around that, but I don't have enough of an understanding of template code to figure out whats going on. Now before anyone says I could just make it a normal function parameter and yes I could do that, if you look at the function itself:
template<int threadcount>
INLINE void runOpenCL()
{
constexpr int itemsPerThread = (MATRIX_HEIGHT threadcount - 1) / threadcount;
// executing the kernel
clObjs.physicsKernel.setArg(2, threadcount);
clObjs.physicsKernel.setArg(3, itemsPerThread);
clObjs.queue.enqueueNDRangeKernel(clObjs.physicsKernel, cl::NullRange, cl::NDRange(threadcount), cl::NullRange);
clObjs.queue.finish();
// making sure OpenGL is finished with its vertex buffer
glFinish();
// acquiring the OpenGL object (vertex buffer) for OpenCL use
const std::vector<cl::Memory> glObjs = { clObjs.glBuffer };
clObjs.queue.enqueueAcquireGLObjects(&glObjs);
// copying the OpenCL buffer to the BufferGL
clObjs.queue.enqueueCopyBuffer(clObjs.outBuffer, clObjs.glBuffer, 0, 0, planets_size_points);
// releasing the OpenGL object
clObjs.queue.enqueueReleaseGLObjects(&glObjs);
}
but I don't want to. Do I need a better reason? I think it would be really cool to implement this. Provided it is still readable in the end.
CodePudding user response:
Here is a possible version that will unfold the loop using C 17 fold expression:
#include <type_traits>
#include <utility>
template <std::size_t I>
void runOpenCL();
template <std::size_t... Is>
void runAllImpl(std::index_sequence<Is... >) {
// thanks @Franck for the better fold expression
(runOpenCL<Is>(), ...);
}
void runAll() {
runAllImpl(std::make_index_sequence<10000>{});
}
Without C 17 you can do something like this but in non-optimized build you will get a huge stack blow-up:
#include <type_traits>
#include <utility>
template <std::size_t I>
void runOpenCL();
template <std::size_t... Is>
void runAllImpl(std::index_sequence<Is... >) {
int arr[]{ (runOpenCL<Is>(), 0)... };
(void)arr;
}
void runAll() {
runAllImpl(std::make_index_sequence<10000>{});
}
This seems to work with larger value than @康桓瑋's proposition but (at least) GCC does not manage to compile for 1000000 (10000 is "ok").
CodePudding user response:
You can generate a fixed-size function table at compile-time, and invoke the corresponding function in the table through runtime index. For example like this:
#include <array>
template<std::size_t N>
constexpr auto gen_func_table = []<std::size_t... Is>
(std::index_sequence<Is...>) {
return std::array{ [] { runOpenCL<Is>(); }...};
}(std::make_index_sequence<N>{});
int main() {
constexpr std::size_t max_count = 100;
constexpr auto& func_table = gen_func_table<max_count>;
for(int i = 1; i < max_count; i)
func_table[i]();
}