Home > database >  Split dataframe according to common consecutive sequences
Split dataframe according to common consecutive sequences

Time:12-31

Let's consider this DataFrame :

import pandas as pd

df = pd.DataFrame({"type" : ["dog", "cat", "whale", "cat", "cat", "lion", "dog"],
                   "status" : [False, True, True, False, False, True, True],
                   "age" : [4, 6, 7, 7, 1, 7, 5]})

It looks like that :

    type  status  age
0    dog   False    4
1    cat    True    6
2  whale    True    7
3    cat   False    7
4    cat   False    1
5   lion    True    7
6    dog    True    5

I want to split this dataframe according to consecutive identical values in the column status. The result is stored in a list.

Here i write the expected result manually :

result = [df.loc[[0],:], df.loc[1:2,:], df.loc[3:4,:], df.loc[5:6,:]]

So result[0] is this dataframe:

  type  status  age
0  dog   False    4

result[1] is this dataframe:

    type  status  age
1    cat    True    6
2  whale    True    7

result[2] is this dataframe:

  type  status  age
3  cat   False    7
4  cat   False    1

result[3] is dataframe:

   type  status  age
5  lion    True    7
6   dog    True    5

What is the most efficient way to do that ?

CodePudding user response:

Let us do

s = df.status.ne(df.status.shift())
result = [ y for _ , y in df.groupby(s.cumsum())]
#result[0]
#Out[69]: 
#  type  status  age
#0  dog   False    4
  • Related