I have a numpy array total_weights
which is an IxI
array of floats. Each row/columns corresponds to one of I
items.
During my main loop I acquire another real float array weights
of size NxM
(N, M < I
) where each/column row also corresponds to one of the original I
items (duplicates may also exist).
I want to add this array to total_weights
. However, the sizes and order of the two arrays are not aligned. Therefore, I maintain a position map, a pandas Series with an index of item IDs to their proper index/position in total_weights
, called pos_df
.
In order to properly make the addition I want I perform the following operation inside the loop:
candidate_pos = pos_df.loc[candidate_IDs] # don't worry about how I get these
rated_pos = pos_df.loc[rated_IDs] # ^^
total_weights[candidate_pos, :][:, rated_pos] = weights
Unfortunately, the above operation must be editing a copy of the orignal total_weights
matrix and not a view of it, since after the loop the total_weights
array is still full of zeroes. How do I make it change the original data?
Edit:
I want to clarify that candidate_IDs are the N
IDs of items and rated_IDs
are the M
IDs of items in the NxM
array called weights
. Through pos_df
I can get their total order in all of I
items.
Also, my guess as to the reason a copy is returned is that candidate_IDs
and thus candidate_pos
will probably contain duplicates e.g. [0, 1, 3, 1, ...]
. So the same rows will sometimes have to be pulled into the new array/view.
CodePudding user response:
Your first problem is in how you are using indexing. As candidate_pos
is an array, total_weights[candidate_pos, :]
is a fancy indexing operation that returns a new array. When you apply indexing again, i.e. ...[:, rated_pos]
you are assigning elements to the newly created array rather than to total_weights
.
The second problem, as you have already spotted, is in the actual logic you are trying to apply. If I understand your example correctly, you have a I x I
matrix with weights, and you want to update weights for a sequence of pairs ((Ix_1, Iy_1), ..., (Ix_N, Iy_N))
with repetitions, with a single line of code. This can't be done in this way, using =
operator, as you'll find yourself having added to weights[Ix_n, Iy_n]
the weight corresponding to the last time (Ix_n, Iy_n)
appears in your sequence: you have to first merge all the repeating elements in your sequence of weight updates, and then perform the update of your weights
matrix with the new "unique" sequence of updates. Alternatively, you must collect your weights
as an I x I
matrix, and directly sum it to total_weights
.
CodePudding user response:
After @rveronese pointed out that it's impossible to do it one go because of the duplicates in candidate_pos I believe I have managed to do what I want with a for-loop on them:
candidate_pos = pos_df.loc[candidate_IDs] # don't worry about how I get these
rated_pos = pos_df.loc[rated_IDs] # ^^
for i, c in enumerate(candidate_pos):
total_weights[c, rated_pos] = weights[i, :]
In this case, the indexing does not create a copy and the assignment should be working as expected...