Home > OS >  Python Numpy - Slicing assignment not assigning correctly
Python Numpy - Slicing assignment not assigning correctly

Time:10-26

I have a 2d numpy array called arm_resets that has positive integers. The first column has all positive integers < 360. For all columns other than the first, I need to replace all values over 360 with the value that is in the same row in the 1st column. I thought this would be a relatively easy thing to do, here's what I have:

i = 300
over_360 = arm_resets[:, [i]] >= 360
print(arm_resets[:, [i]][over_360])
print(arm_resets[:, [0]][over_360])
arm_resets[:, [i]][over_360] = arm_resets[:, [0]][over_360]
print(arm_resets[:, [i]][over_360])

And here's what prints:

[3600 3609 3608 ... 3600 3611 3605]
[ 0  9  8 ...  0 11  5]
[3600 3609 3608 ... 3600 3611 3605]

Since all numbers that are being shown in the first print (first 3 and last 3) are above 360, they should be getting replaced by the 2nd print in the 3rd print. Why is this not working?

edit: reproducible example:

df = pd.DataFrame({"start":[1,2,5,6],"freq":[1,5,6,9]})
periods = 6
arm_resets = df[["start"]].values
freq = df[["freq"]].values
arm_resets = np.pad(arm_resets,((0,0),(0,periods-1)))
for i in range(1,periods):
    arm_resets[:,[i]] = arm_resets[:,[i-1]]   freq
    #over_360 = arm_resets[:,[i]] >= periods
    #arm_resets[:,[i]][over_360] = arm_resets[:,[0]][over_360]
arm_resets

Given commented out code here's what prints:

array([[ 1,  2,  3,  4,  5,  6],
       [ 2,  7, 12, 17, 22, 27],
       [ 3,  9, 15, 21, 27, 33],
       [ 4, 13, 22, 31, 40, 49]])

What I would expect:

array([[ 1,  2,  3,  4,  5,  1],
       [ 2,  2, 2, 2, 2, 2],
       [ 3,  3, 3, 3, 3, 3],
       [ 4, 4, 4, 4, 4, 4]])

Now if it helps, the final 2d array I'm actually trying to create is a 1/0 array that indicates which are filled in, so in this example I'd want this:

array([[ 0,  1,  1,  1,  1,  1],
       [ 0,  0, 1, 0, 0, 0],
       [ 0,  0, 0, 1, 0, 0],
       [ 0, 0, 0, 0, 1, 0]])

The code I use to achieve this from the above arm_resets is this:

fin = np.zeros((len(arm_resets),periods),dtype=int)
for i in range(len(arm_resets)):
    fin[i,a[i]] = 1

CodePudding user response:

You can use np.where()

first_col = arm_resets[:,0] # first col
first_col = first_col.reshape(first_col.size,1) #Transfor in 2d array
arm_resets = np.where(arm_resets >= 360,first_col,arm_resets)

You can see in detail how np.where work here, but basically it compare arm_resets >= 360, if true it put first_col value in place (there another detail here with broadcasting) if false it put arm_resets value.

Edit: As suggested by Mad Physicist. You can use arm_resets[:,0,None] directly instead of creating first_col variable.

arm_resets = np.where(arm_resets >= 360,arm_resets[:,0,None],arm_resets)

CodePudding user response:

The slice arm_resets[:, [i]] is a fancy index, and therefore makes a copy of the ith column of the data. arm_resets[:, [i]][over_360] = ... therefore calls __setitem__ on a temporary array that is discarded as soon as the statement executes. If you want to assign to the mask, call __setitem__ on the sliced object directly:

arm_resets[over_360, [i]] = ...

You also don't need to make the index into a list. It's generally better to use simple indices, especially when doing assignments, since they create views rather than copies:

arm_resets[over_360, i] = ...

With slicing, even the following should work, since it calls __setitem__ on a view:

arm_resets[:, i][over_360] = ...

This index does not help you process each row of the data, since i is a column. In fact, you can process the entire matrix in one step, without looping, if you use indices rather than a boolean mask. The reason that indices are useful is that you can match the item from the correct row in the first column:

rows, cols = np.nonzero(arm_resets[:, 1:] >= 360)
arm_resets[rows, cols] = arm_resets[rows, 1]
  • Related