In C , which one is faster? taking input, storing in array and adding simultaneously or taking inpu-CodePudding

I have a scenario in which I have to take n inputs from the user and store it in an array and I also need to find the sum of all elements in the array. Since it is in codeforces, I wanted to know about the most optimal way of doing it.

Generally, I/O operations are considered notorious for using a lot of CPU time. Does it have a performance hit in large-scale input (i.e. 1000s of inputs) if I frequently switch over from I/O and processing than just taking plain input first and then processing the sum later?

I am using the C language and online judge (Codeforces).

First approach: I loop through n times and read n inputs from the input stream and place them one by one in the array. And later sum them all.

int n, sum=0;
cin >> n;

int *myarray = new int[n];

for(int i=0; i<n; i  )
{
    cin >> myarray[i];
}

for(int i=0; i<n; i  )
{
    sum  = myarray[i];
}

Second approach: I loop through n times in and read n inputs. Each time, I place it in an array and sum the number I just took as input as well.

int n, sum=0;
cin >> n;

int *myarray = new int[n];

for(int i=0; i<n; i  )
{
    cin >> myarray[i];
    sum  = myarray[i];
}

CodePudding user response：

In a program like this, where you have a single thread that takes the user input, stores it in an array and sums it, there'd be no (noticeable) difference - the same operations happen anyway.

The more obvious optimization would be to lose the array. If the required output is just the sum, you could sum the inputs directly without storing them:

int sum = 0;
for(int i=0; i<n; i  )
{
    int temp;
    cin >> temp;
    sum  = temp;
}

CodePudding user response：

I suspect in practice though that cin is so slow by comparison to the other operations in your function, that any difference is unlikely to be measurable.

That said if we ignore the slowness of cin and focus on the quality of the code generated by your function I see a couple of advantages to the single-loop approach.

The first example has all the loop control instructions twice, those instructions are cheap but are not free.
A memory location that was recently written is almost certain to still be in L1 cache, a memory location written in the previous loop may not be if the number of iterations is large.

A possible downside of doing too many different things in one loop is that you can run out of "callee-save" registers, forcing the code to use the stack for variables that could otherwise live in registers, but I don't think that will be an issue here on most architectures.

Another thing I noticed while looking at the compiled code on godbolt is that because cin takes a reference rather than returning a value, "n" can't live in a register across function calls. So the loop calling cin has to reload it on every iteration.

CodePudding user response：

Strictly speaking, doing them all in one loop is faster, but not a difference that you should worry about.

For Codeforces, you would probably want to have faster io, so I suggest that you use scanf and printf instead of cin and cout, they are known to be faster.

If you want to use cin and cout, you can also add the line, it makes them faster, you can check why here

ios_base::sync_with_stdio(false);
cin.tie(NULL);

Why?
When this code translates to assembly the loop is simply a conditional jump statement, a jump may be taken or not taken depending on the condition, only when the condition is evaluated can we know.
We sure don't want to wait for the condition evaluation, so we have branch prediction (it can be done in many ways), which does "some computation" to estimate the if the branch is taken or not.
Until the condition is evaluated we follow our assumption, and execute the instructions, without committing their results to either registers not memory.

If our estimation is correct, those results are committed (we continue our execution naturally), if not we simply flush the pipeline (throw away those instructions and their results).

Still our performance is reduced this miss prediction penalty (the results we computed with no need), which is why reducing control of flow instructions helps performance.

For your case, splitting the loop into two, will make two conditional jump instructions instead of one, which increases the miss penalty.

some compilers do loop unrolling to solve problems similar to this.

As long as you are not working with HPC (High performance computing) applications, or really require better optimization at all costs, you don't need to worry about this.

Instead focus on improving the algorithm itself, so as @Mureinik said, if your problem only requires the sum, with no further use of the numbers themselves, you can just compute the sum on the fly.