How to split/slice a Pandas dataframe into multiple dataframes by column value?-CodePudding

I have the following Pandas DF:

0 | A
1 | B
2 | A
3 | A
4 | B
5 | A
6 | B
7 | B
8 | A
9 | A

I want to slice this single DF into multiple ones along B values so that I omit all Bs and got all the consecutive A rows as the respective resulting DFs like this:

df#1:

0 | A

df#2:

2 | A
3 | A

df#3:

5 | A

df#4:

8 | A
9 | A

Order of A-rows must be kept. How to perform this action? (The actual task is a time series of relevant events that must be handled as one event separated by irrelevant events.)

CodePudding user response：

You can create loop by consecutive A values created by mask for compare column col with groups created by chain mask with shifted inverted values with cumulative sum:

m = df.col.eq('A')

for i, g in df[m].groupby((m & ~m.shift(fill_value=False)).cumsum()):
    print (g)
  col
0   A
  col
2   A
3   A
  col
5   A
  col
8   A
9   A

or dictonary of DataFrames:

m = df.col.eq('A')
d = {i: g for i, g in df[m].groupby((m & ~m.shift(fill_value=False)).cumsum())}
    
print (d)
{1:   col
0   A, 2:   col
2   A
3   A, 3:   col
5   A, 4:   col
8   A
9   A}

print (d[1])
  col
0   A

Another similar idea:

m = df.col.eq('A')
d = {i: g for i, g in df[m].groupby(m.ne(m.shift()).cumsum())}
    
print (d)
{1:   col
0   A, 3:   col
2   A
3   A, 5:   col
5   A, 7:   col
8   A
9   A}

CodePudding user response：

You can use itertools.groupby to filter out the portions you want -

from itertools import groupby
dfs = [pd.DataFrame.from_records(list(g), 
       columns=df.reset_index().columns, index='index')
       for k, g in 
       groupby(df.to_records(), key=lambda x: x[2])
       if k.strip() == 'A']
df_1, df_2, df_3, df_4 = dfs #This is probably not necessary

Output

# df_1
       0   1
index       
0      0   A

# df_2
       0   1
index       
2      2   A
3      3   A