I want to inline the function MyClass:at()
, but performance isn't as I expect.
MyClass.cpp
#include <algorithm>
#include <chrono>
#include <iostream>
#include <vector>
#include <string>
// Making this a lot shorter than in my actual program
std::vector<std::vector<int>> arrarr =
{
{ 1, 70, 54, 71, 83, 51, 54, 69, 16, 92, 33, 48, 61, 43, 52, 1, 89, 19, 67, 48},
{24, 47, 32, 60, 99, 3, 45, 2, 44, 75, 33, 53, 78, 36, 84, 20, 35, 17, 12, 50},
{32, 98, 81, 28, 64, 23, 67, 10, 26, 38, 40, 67, 59, 54, 70, 66, 18, 38, 64, 70},
{67, 26, 20, 68, 2, 62, 12, 20, 95, 63, 94, 39, 63, 8, 40, 91, 66, 49, 94, 21},
{24, 55, 58, 5, 66, 73, 99, 26, 97, 17, 78, 78, 96, 83, 14, 88, 34, 89, 63, 72},
{21, 36, 23, 9, 75, 0, 76, 44, 20, 45, 35, 14, 0, 61, 33, 97, 34, 31, 33, 95},
{78, 17, 53, 28, 22, 75, 31, 67, 15, 94, 3, 80, 4, 62, 16, 14, 9, 53, 56, 92},
{16, 39, 5, 42, 96, 35, 31, 47, 55, 58, 88, 24, 0, 17, 54, 24, 36, 29, 85, 57},
{86, 56, 0, 48, 35, 71, 89, 7, 5, 44, 44, 37, 44, 60, 21, 58, 51, 54, 17, 58},
{19, 80, 81, 68, 5, 94, 47, 69, 28, 73, 92, 13, 86, 52, 17, 77, 4, 89, 55, 40},
{ 4, 52, 8, 83, 97, 35, 99, 16, 7, 97, 57, 32, 16, 26, 26, 79, 33, 27, 98, 66},
{88, 36, 68, 87, 57, 62, 20, 72, 3, 46, 33, 67, 46, 55, 12, 32, 63, 93, 53, 69},
{ 4, 42, 16, 73, 38, 25, 39, 11, 24, 94, 72, 18, 8, 46, 29, 32, 40, 62, 76, 36},
{20, 69, 36, 41, 72, 30, 23, 88, 34, 62, 99, 69, 82, 67, 59, 85, 74, 4, 36, 16},
{20, 73, 35, 29, 78, 31, 90, 1, 74, 31, 49, 71, 48, 86, 81, 16, 23, 57, 5, 54},
{ 1, 70, 54, 71, 83, 51, 54, 69, 16, 92, 33, 48, 61, 43, 52, 1, 89, 19, 67, 48},
};
class MyClass
{
public:
MyClass(std::vector<std::vector<int>> arr) : arr(arr)
{
rows = arr.size();
cols = arr.at(0).size();
}
inline auto at(int row, int col) const { return arr[row][col]; }
void arithmetic(int n) const;
private:
std::vector<std::vector<int>> arr;
int rows;
int cols;
};
MyClass.cpp
:
void MyClass::arithmetic(int n) const
{
using std::chrono::high_resolution_clock;
using std::chrono::duration_cast;
using std::chrono::duration;
using std::chrono::milliseconds;
auto t1 = high_resolution_clock::now();
int highest_product = 0;
for (auto y = 0; y < rows; y)
{
for (auto x = 0; x < cols; x)
{
// Horizontal product
if (x n < cols)
{
auto product = 1;
for (auto i = 0; i < n; i)
{
product *= at(y, x i);
}
highest_product = std::max(highest_product, product);
}
}
}
auto t2 = high_resolution_clock::now();
duration<double, std::milli> ms_double = t2 - t1;
std::cout << ms_double.count() << "ms\n";
return highestProduct;
};
Now what I want know is why do I get better performance when I replace product *= at(y, x i);
with product *= arr[y][x i];
? When I test it with the first case, the timing on my large array takes roughly 6.7ms
, and the second case takes 5.3ms
. I thought when I inlined the function, it should be the same implementation as the second case.
CodePudding user response:
Member function directly defined in the class definition (typically in header files) are implicitly inlined so using inline
is useless in this case. inline
do not guarantee the function is inlined. It is just an hint for the compiler. The keyword is also an important during the link to avoid the multiple-definition issue. Function that are not make inline
can still be inlined if the compiler can see the code of the target function (ie. it is in the same translation unit or link time optimization are applied). For more information about this, please read Why are class member functions inlined?
Note that the inlining is typically performed in the optimization step of compilers (eg. -O1
//O1
). Thus without optimizations, most compilers will not inline the function.
Using std::vector<std::vector<int>>
is not efficient since it is not a contiguous data structure and it require 2 indirection to access an item. Two sub-vectors next to each other can be stored far away in memory likely causing more cache misses (and/or thrashing due to the alignment). Please consider using one big flatten array and access items using y*cols x
where cols
is the size of the sub-vectors (20 here). Alternatively a int[16][20]
data type should do the job well if the size if fixed at compile-time.
MyClass(std::vector<std::vector<int>> arr)
cause the input parameter to be copied (and so all the sub-vectors). Please consider using a const std::vector<std::vector<int>>&
type.
While at
is convenient for checking bounds at runtime, this feature can strongly decrease performance. Consider using the operator []
if you do not need that. You can use assertions combined with flatten arrays so to get a fast code in release and a safe code in debug (you can enable/disable them by defining the NDEBUG
macro).