I would like to replace values in a column, but only to the values seen after an specific value
for example, I have the following dataset:
In [108]: df=pd.DataFrame([[12,13,14,15,16,17],[4,10,5,6,1,3],[1, 3,5,4,9,1],[2, 4, 1,8,3,4], [4, 2, 6,7,1,8]], columns=['ID','time,'A', 'B', 'C'])
In [109]: df
Out[109]:
ID time A B C
0 12 4 1 2 4
1 13 10 3 4 2
2 14 5 5 1 6
3 15 6 4 8 7
4 16 1 9 3 1
5 17 3 1 4 8
and I want to change for column "A" all the values that come after 5 for a 1, for column "B" all the values that come after 1 for 6, for column "C" change all the values after 7 for a 5. so it will look like this:
ID time A B C
0 12 4 1 2 4
1 13 10 3 4 2
2 14 5 5 1 6
3 15 6 1 6 7
4 16 1 1 6 5
5 17 3 1 6 5
I know that I could use where
to get sort of a similar effect, but if I put a condition like df["A"] = np.where(x!=5,1,x)
, but obviously this will change the values before 5 as well. I can't think of anything else at the moment.
Thanks for the help.
CodePudding user response:
Use DataFrame.mask
with shifted valeus by DataFrame.shift
, compared by dictioanry and for next True
s is used DataFrame.cummax
:
df=pd.DataFrame([[12,13,14,15,16,17],[4,10,5,6,1,3],
[1, 3,5,4,9,1],[2, 4, 1,8,3,4], [4, 2, 6,7,1,8]],
index=['ID','time','A', 'B', 'C']).T
after = {'A':5, 'B':1, 'C': 7}
new = {'A':1, 'B':6, 'C': 5}
cols = list(after.keys())
s = pd.Series(new)
df[cols] = df[cols].mask(df[cols].shift().eq(after).cummax(), s, axis=1)
print (df)
ID time A B C
0 12 4 1 2 4
1 13 10 3 4 2
2 14 5 5 1 6
3 15 6 1 6 7
4 16 1 1 6 5
5 17 3 1 6 5