Home > Software design >  Changing the values of sliced numpy array doesn't change the original data in it
Changing the values of sliced numpy array doesn't change the original data in it

Time:08-30

I have a numpy array total_weights which is an IxI array of floats. Each row/columns corresponds to one of I items.

During my main loop I acquire another real float array weights of size NxM (N, M < I) where each/column row also corresponds to one of the original I items (duplicates may also exist).

I want to add this array to total_weights. However, the sizes and order of the two arrays are not aligned. Therefore, I maintain a position map, a pandas Series with an index of item IDs to their proper index/position in total_weights, called pos_df.

In order to properly make the addition I want I perform the following operation inside the loop:

candidate_pos = pos_df.loc[candidate_IDs]     # don't worry about how I get these
rated_pos = pos_df.loc[rated_IDs]             # ^^
total_weights[candidate_pos, :][:, rated_pos]  = weights

Unfortunately, the above operation must be editing a copy of the orignal total_weights matrix and not a view of it, since after the loop the total_weights array is still full of zeroes. How do I make it change the original data?

Edit: I want to clarify that candidate_IDs are the N IDs of items and rated_IDs are the M IDs of items in the NxM array called weights. Through pos_df I can get their total order in all of I items.

Also, my guess as to the reason a copy is returned is that candidate_IDs and thus candidate_pos will probably contain duplicates e.g. [0, 1, 3, 1, ...]. So the same rows will sometimes have to be pulled into the new array/view.

CodePudding user response:

Your first problem is in how you are using indexing. As candidate_pos is an array, total_weights[candidate_pos, :] is a fancy indexing operation that returns a new array. When you apply indexing again, i.e. ...[:, rated_pos] you are assigning elements to the newly created array rather than to total_weights.

The second problem, as you have already spotted, is in the actual logic you are trying to apply. If I understand your example correctly, you have a I x I matrix with weights, and you want to update weights for a sequence of pairs ((Ix_1, Iy_1), ..., (Ix_N, Iy_N)) with repetitions, with a single line of code. This can't be done in this way, using = operator, as you'll find yourself having added to weights[Ix_n, Iy_n] the weight corresponding to the last time (Ix_n, Iy_n) appears in your sequence: you have to first merge all the repeating elements in your sequence of weight updates, and then perform the update of your weights matrix with the new "unique" sequence of updates. Alternatively, you must collect your weights as an I x I matrix, and directly sum it to total_weights.

CodePudding user response:

After @rveronese pointed out that it's impossible to do it one go because of the duplicates in candidate_pos I believe I have managed to do what I want with a for-loop on them:

candidate_pos = pos_df.loc[candidate_IDs]     # don't worry about how I get these
rated_pos = pos_df.loc[rated_IDs]             # ^^
for i, c in enumerate(candidate_pos):
     total_weights[c, rated_pos]  = weights[i, :]

In this case, the indexing does not create a copy and the assignment should be working as expected...

  • Related