Dictionary entries are all the same-CodePudding

I am using nested loops to fill the entries in a dictionary. Each entry in the dictionary contains a Numpy array. Each Numpy array contains three values.

Each dictionary entry is supposed to contain a unique Numpy array, but when the script is run, all 169 of the dictionary entries are being filled with the exact same Numpy array. Each one should be different. After trying to debug, I realized that the repeated entry is actually the correct value for entry #168 (the very last one).

I'm not why this is happening because the command print(IC_sub_units) is outputting the correct values in the I/O console, but when I try to store them in the dictionary, all entries turn out the same. I'm guessing the issue is in the last piece of code: IC_units[i] = IC_sub_units

I've been trying to solve this for hours on end. It's probably something really simple. Any ideas?

# Determining incremental costs (rise/run)
IC_sub_units = np.zeros(3)
IC_units = {}
for i in range(169):
    for j in range(1, 4):
        IC_sub_units[j-1] = (y[i][j]-y[i][j-1])/(x[i][j]-x[i][j-1])
    print(IC_sub_units)
    IC_units[i] = IC_sub_units

Printing IC_units produces the following output for dictionary entries 0 through 168. The value shown below is the correct value for entry 168, but for some reason, it's appearing in every dictionary entry.

 157: array([40.83088018, 42.50615291, 44.18142564]),
 158: array([40.83088018, 42.50615291, 44.18142564]),
 159: array([40.83088018, 42.50615291, 44.18142564]),
 160: array([40.83088018, 42.50615291, 44.18142564]),
 161: array([40.83088018, 42.50615291, 44.18142564]),
 162: array([40.83088018, 42.50615291, 44.18142564]),
 163: array([40.83088018, 42.50615291, 44.18142564]),
 164: array([40.83088018, 42.50615291, 44.18142564]),
 165: array([40.83088018, 42.50615291, 44.18142564]),
 166: array([40.83088018, 42.50615291, 44.18142564]),
 167: array([40.83088018, 42.50615291, 44.18142564]),
 168: array([40.83088018, 42.50615291, 44.18142564])}

CodePudding user response：

Just move the definition of IC_sub_units into the first loop:

IC_units = {}
for i in range(169):
    IC_sub_units = np.zeros(3)
    for j in range(1, 4):
        IC_sub_units[j-1] = (y[i][j]-y[i][j-1])/(x[i][j]-x[i][j-1])
    print(IC_sub_units)
    IC_units[i] = IC_sub_units

CodePudding user response：

You are not allocating a new array at every iteration. Allocation happens with np.zeros in your example. The call happens when that line runs, not every time you reference the name IC_sub_units.

To fix this, you allocate inside the loop. However, in this case, you don't need to call np.zeros or use a nested for loop. In fact, as a rule of thumb, you should avoid explicit loops with numpy arrays.

Here is how to vectorize the computation and allocate the buffer you need all at once:

IC_units = {}
for i in range(169):
    IC_units[i] = np.diff(y[i]) / np.diff(x[i])

You can write this as a comprehension:


IC_units = {i: np.diff(y[i]) / np.diff(x[i]) for i in range(169)}

If x and y are numpy arrays rather than lists as your indexing implies, the problem is even simpler. You can precompute all the slopes up-front:

IC_sub_units = np.diff(y, axis=1) / np.diff(x, axis=1)
IC_units = {i: IC_sub_units[i] for i in range(1, 169)}

You could also phrase it as

dict(zip(range(1, 169), IC_sub_units))

But at that point, you should ask yourself if something indexed numerically even needs to be in a dictionary. For a given i, using the last definition, IC_sub_units[i - 1] is the same array as the dictionary would return for i.