I have the following Pandas DF:
0 | A
1 | B
2 | A
3 | A
4 | B
5 | A
6 | B
7 | B
8 | A
9 | A
I want to slice this single DF into multiple ones along B values so that I omit all Bs and got all the consecutive A rows as the respective resulting DFs like this:
df#1:
0 | A
df#2:
2 | A
3 | A
df#3:
5 | A
df#4:
8 | A
9 | A
Order of A-rows must be kept. How to perform this action? (The actual task is a time series of relevant events that must be handled as one event separated by irrelevant events.)
CodePudding user response:
You can create loop by consecutive A
values created by mask for compare column col
with groups created by chain mask with shifted inverted values with cumulative sum
:
m = df.col.eq('A')
for i, g in df[m].groupby((m & ~m.shift(fill_value=False)).cumsum()):
print (g)
col
0 A
col
2 A
3 A
col
5 A
col
8 A
9 A
or dictonary of DataFrames:
m = df.col.eq('A')
d = {i: g for i, g in df[m].groupby((m & ~m.shift(fill_value=False)).cumsum())}
print (d)
{1: col
0 A, 2: col
2 A
3 A, 3: col
5 A, 4: col
8 A
9 A}
print (d[1])
col
0 A
Another similar idea:
m = df.col.eq('A')
d = {i: g for i, g in df[m].groupby(m.ne(m.shift()).cumsum())}
print (d)
{1: col
0 A, 3: col
2 A
3 A, 5: col
5 A, 7: col
8 A
9 A}
CodePudding user response:
You can use itertools.groupby
to filter out the portions you want -
from itertools import groupby
dfs = [pd.DataFrame.from_records(list(g),
columns=df.reset_index().columns, index='index')
for k, g in
groupby(df.to_records(), key=lambda x: x[2])
if k.strip() == 'A']
df_1, df_2, df_3, df_4 = dfs #This is probably not necessary
Output
# df_1
0 1
index
0 0 A
# df_2
0 1
index
2 2 A
3 3 A