Current Dataframe :
key status score
A initial 2
A in-between 3
A intermediate 6
A pre-final 2
A final 3
B initial 1
B intermediate 7
B final 8
I am looking to get two seperate dataframe , dataframe 1 should only include rows before the status is intermediate and dataframe 2 should only include rows after the status is intermediate
Expected Dataframes :
Dataframe 1 :
key status score
A initial 2
A in-between 3
B initial 1
Dataframe 2 :
key status score
A pre-final 2
A final 3
B final 8
Tried :
cond = df.status.eq("intermediate")
mask = cond.groupby(df.key).cummax().shift(fill_value=False)
dataframe_1 = df[~mask]
dataframe_2 = df[mask]
with this i am not getting the required dataframes
CodePudding user response:
Remove shift
for first mask and for second filter out intermediate
values:
cond = df.status.eq("intermediate")
mask = cond.groupby(df.key).cummax()
dataframe_1 = df[~mask]
print (dataframe_1)
key status score
0 A initial 2
1 A in-between 3
5 B initial 1
dataframe_2 = df[mask & ~cond]
print (dataframe_2)
key status score
3 A pre-final 2
4 A final 3
7 B final 8
CodePudding user response:
You can use a groupby.cummin
to set all rows from "intermediate" to False, then use boolean indexing:
m1 = df['status'].ne('intermediate')
m2 = m1.groupby(df['key']).cummin()
dataframe_1 = df[m2]
# key status score
# 0 A initial 2
# 1 A in-between 3
# 5 B initial 1
dataframe_2 = df[m1 & ~m2]
# key status score
# 3 A pre-final 2
# 4 A final 3
# 7 B final 8