I'm working on a function which will be called about 240000000 times. In that function, I will access three vectors just one time each. like:
aj = a_[j];
bj = b_[j];
cj = c_[j];
All the vectors are defined in the same class, having 999 elements, which type is doule. It takes about 60s to finish the job. But if I change vector access to three double variables, the time will reduce to 10s. like:
aj = fa;
bj = fb;
cj = fc;
If I change vector to array, it help less, using about 50s. Why the time gap is so large? I think array access only involve index caculate. Any idea about that?
CodePudding user response:
The gap may caused by cache filling. Try to re-order the vector as follows:
struct TT{
double a_;
double b_;
double c_;
};
struct TT vector[999];
struct TT* p = &vector[j];
You can use p
to access a_
, b_
and c_
as you need.
CodePudding user response:
aj = a_[j];
will likely have to come from the L1 cache, considering the array size. But aj = fa;
? Chances are that this doesn't even take 1 instruction. The compiler might simply note that the two variables have the same value. Thus, in code later on that reads aj
, the compiler simply reads fa
instead.
Array access calculation on modern CPU's is close to free. x86 in particular can do an effectlive "load from Base plus Index * 8"