Home > OS >  mask value range column-wise only if it occurs k-times
mask value range column-wise only if it occurs k-times

Time:09-30

I have a df with 4 columns, and if there are two values between 1 and 8 per column, the value should be changed to 500 and all other values should remain the same

df = pd.DataFrame(data={'a':[0,9,12,10,11],
                        'b': [1,8,90,2,0],
                        'c': [0,5,3,10,18],
                        'd': [15,43,90,14,87]})

Here, only the third column should have values (1 and 2) replaced as it has two values between 1 and 8. Whereas columns a and d have no one occurrences, and column 2 has three occurrences (1,2,8). I can create a mask to apply to all instances, but i cannot set it to limit the number of occurrences. any suggestions welcome, thank you

# convert zeros to nan
df.replace(0, np.nan, inplace=True)
mask1 = df.lt(9)
df.mask(mask1, 500)

CodePudding user response:

Try with this:

>>> df[(df.ge(1) & df.le(8) & (df.ge(1) & df.le(8)).sum().eq(2))] = 500
>>> df
    a   b    c   d
0   0   1    0  15
1   9   8  500  43
2  12  90  500  90
3  10   2   10  14
4  11   0   18  87
>>> 

Or better assigning as a variable:

>>> conds = df.ge(1) & df.le(8)
>>> df[conds & conds.sum().eq(2)] = 500
>>> df
    a   b    c   d
0   0   1    0  15
1   9   8  500  43
2  12  90  500  90
3  10   2   10  14
4  11   0   18  87
>>>
  • Related