I have 2 arrays of a million elements (created from an image with the brightness of each pixel) I need to get a number that is the sum of the products of the array elements of the same name. That is, A(1,1) * B(1,1) A(1,2) * B(1,2)... In the loop, python takes the value of the last variable from the loop (j1) and starts running through it, then adds 1 to the penultimate variable and runs through the last one again, and so on. How can I make it count elements of the same name? res1, res2 - arrays (specifically - numpy.ndarray) Perhaps there is a ready-made function for this, but I need to make it as open as possible, without a ready-made one.
sum = 0
for i in range(len(res1)):
for j in range(len(res2[i])):
for i1 in range(len(res2)):
for j1 in range(len(res1[i1])):
sum = res1[i][j]*res2[i1][j1]
CodePudding user response:
In the first part of my answer I'll explain how to fix your code directly. Your code is almost correct but contains one big mistake in logic. In the second part of my answer I'll explain how to solve your problem using numpy
. numpy
is the standard python package to deal with arrays of numbers. If you're manipulating big arrays of numbers, there is no excuse not to use numpy.
Fixing your code
Your code uses 4 nested for
-loops, with indices i
and j
to iterate on the first array, and indices i1
and j1
to iterate on the second array.
Thus you're multiplying every element res1[i][j] from the first array, with every element res2[i1][j1] from the second array. This is not what you want. You only want to multiply every element res1[i][j]
from the first array with the corresponding element res2[i][j]
from the second array: you should use the same indices for the first and the second array. Thus there should only be two nested for
-loops.
s = 0
for i in range(len(res1)):
for j in range(len(res1[i])):
s = res1[i][j] * res2[i][j]
Note that I called the variable s
instead of sum
. This is because sum
is the name of a builtin function in python. Shadowing the name of a builtin is heavily discouraged. Here is the list of builtins: https://docs.python.org/3/library/functions.html ; do not name a variable with a name from that list.
Now, in general, in python, we dislike using range(len(...))
in a for-loop. If you read the official tutorial and its section on for
loops, it suggests that for
-loop can be used to iterate on elements directly, rather than on indices.
For instance, here is how to iterate on one array, to sum the elements on an array, without using range(len(...))
and without using indices:
# sum the elements in an array
s = 0
for row in res1:
for x in row:
s = x
Here row
is a whole row, and x
is an element. We don't refer to indices at all.
Useful tools for looping
are the builtin functions zip
and enumerate
:
enumerate
can be used if you need access both to the elements, and to their indices;zip
can be used to iterate on two arrays simultaneously.
I won't show an example with enumerate
, but zip
is exactly what you need since you want to iterate on two arrays:
s = 0
for row1, row2 in zip(res1, res2):
for x, y in zip(row1, row2):
s = x * y
You can also use builtin function sum
to write this all without =
and without the initial = 0
:
s = sum(x * y for row1,row2 in zip(res1, res2) for x,y in zip(row1, row2))
Using numpy
As I mentioned in the introduction, numpy
is a standard python package to deal with arrays of numbers. In general, operations on arrays using numpy is much, much faster than loops on arrays in core python. Plus, code using numpy is usually easier to read than code using core python only, because there are a lot of useful functions and convenient notations. For instance, here is a simple way to achieve what you want:
import numpy as np
# convert to numpy arrays
res1 = np.array(res1)
res2 = np.array(res2)
# multiply elements with corresponding elements, then sum
s = (res1 * res2).sum()
Relevant documentation:
CodePudding user response:
Solution 1:
import numpy as np
a,b = np.array(range(100)), np.array(range(100))
print((a * b).sum())
Solution 2 (more open, because of use of pd.DataFrame):
import pandas as pd
import numpy as np
a,b = np.array(range(100)), np.array(range(100))
df = pd.DataFrame(dict({'col1': a, 'col2': b}))
df['vect_product'] = df.col1 * df.col2
print(df['vect_product'].sum())