Let's consider this DataFrame :
import pandas as pd
df = pd.DataFrame({"type" : ["dog", "cat", "whale", "cat", "cat", "lion", "dog"],
"status" : [False, True, True, False, False, True, True],
"age" : [4, 6, 7, 7, 1, 7, 5]})
It looks like that :
type status age
0 dog False 4
1 cat True 6
2 whale True 7
3 cat False 7
4 cat False 1
5 lion True 7
6 dog True 5
I want to split this dataframe according to consecutive identical values in the column status. The result is stored in a list.
Here i write the expected result manually :
result = [df.loc[[0],:], df.loc[1:2,:], df.loc[3:4,:], df.loc[5:6,:]]
So result[0] is this dataframe:
type status age
0 dog False 4
result[1] is this dataframe:
type status age
1 cat True 6
2 whale True 7
result[2] is this dataframe:
type status age
3 cat False 7
4 cat False 1
result[3] is dataframe:
type status age
5 lion True 7
6 dog True 5
What is the most efficient way to do that ?
CodePudding user response:
Let us do
s = df.status.ne(df.status.shift())
result = [ y for _ , y in df.groupby(s.cumsum())]
#result[0]
#Out[69]:
# type status age
#0 dog False 4