I hope you can help me, I want to get all the activity according to a condition.
I have a dataframe like this:
ID | Number | Activity |
---|---|---|
1 | 1 | Get Up |
1 | 2 | Wash |
1 | 3 | Dress Up |
2 | 1 | Get Up |
2 | 2 | Dress Up |
2 | 3 | Eat |
2 | 4 | Work |
I have as Target Activity Dress Up, so I should look for the Number of the activity and remove all the number after the number of target activity The output:
ID | Number | Activity |
---|---|---|
1 | 1 | Get Up |
1 | 2 | Wash |
1 | 3 | Dress Up |
2 | 1 | Get Up |
2 | 2 | Dress Up |
I have tried to use the function where but it removes all rows expect the one with target activity:
df= pd.read_csv('data.csv')
End_act = 'Dress Up'
cond = df['Activity']==Endact
df = df[ df['Number']<= df['Number'].where(cond)]
CodePudding user response:
Use GroupBy.cummax
with compare values by End_act
in column Activity
with change order rows by DataFrame.iloc
for set True
s by all previous rows by End_act
, last change order by original and filter in boolean indexing
:
End_act = 'Dress Up'
m = (df.iloc[::-1]
.assign(new = lambda x: x['Activity'].eq(End_act))
.groupby('ID')['new']
.cummax())
df = df[m.iloc[::-1]]
print (df)
ID Number Activity
0 1 1 Get Up
1 1 2 Wash
2 1 3 Dress Up
3 2 1 Get Up
4 2 2 Dress Up
Your solution is changed with DataFrameGroupBy.idxmax
for maximal index created by column Number
:
End_act = 'Dress Up'
s = (df.set_index('Number')
.assign(new = lambda x: x['Activity'].eq(End_act))
.groupby('ID')['new']
.transform('idxmax'))
df = df[df['Number'].le(s.to_numpy())]
print (df)
ID Number Activity
0 1 1 Get Up
1 1 2 Wash
2 1 3 Dress Up
3 2 1 Get Up
4 2 2 Dress Up