I have a df with 4 columns, and if there are two values between 1 and 8 per column, the value should be changed to 500 and all other values should remain the same
df = pd.DataFrame(data={'a':[0,9,12,10,11],
'b': [1,8,90,2,0],
'c': [0,5,3,10,18],
'd': [15,43,90,14,87]})
Here, only the third column should have values (1 and 2) replaced as it has two values between 1 and 8. Whereas columns a and d have no one occurrences, and column 2 has three occurrences (1,2,8). I can create a mask to apply to all instances, but i cannot set it to limit the number of occurrences. any suggestions welcome, thank you
# convert zeros to nan
df.replace(0, np.nan, inplace=True)
mask1 = df.lt(9)
df.mask(mask1, 500)
CodePudding user response:
Try with this:
>>> df[(df.ge(1) & df.le(8) & (df.ge(1) & df.le(8)).sum().eq(2))] = 500
>>> df
a b c d
0 0 1 0 15
1 9 8 500 43
2 12 90 500 90
3 10 2 10 14
4 11 0 18 87
>>>
Or better assigning as a variable:
>>> conds = df.ge(1) & df.le(8)
>>> df[conds & conds.sum().eq(2)] = 500
>>> df
a b c d
0 0 1 0 15
1 9 8 500 43
2 12 90 500 90
3 10 2 10 14
4 11 0 18 87
>>>