I have a pandas dataframe tat looks something like this
A BB
1 foo.bar
2 foo.bar
3 foo.foo
4 foo.bar
5 foo.bar
6 foo.foo
I basically expect to get two dataframes out of them based on this list of lists:
[[False, False, True], [False, False, True]]
OUTPUT should be:
df1:
A BB
1 foo.bar
2 foo.bar
3 foo.foo
df2
A BB
4 foo.bar
5 foo.bar
6 foo.foo
CodePudding user response:
Numpy:
flatnonzero
to find where the'foo.foo'
rows aresplit
to divide the dataframe up accordingly
import numpy as np
np.split(df, np.flatnonzero(df.BB.eq('foo.foo'))[:-1] 1)
[ A BB
0 1 foo.bar
1 2 foo.bar
2 3 foo.foo,
A BB
3 4 foo.bar
4 5 foo.bar
5 6 foo.foo]
Addressing @mozway's comment
list(filter(
lambda d: not d.empty,
np.split(df, np.flatnonzero(df.BB.eq('foo.foo')) 1)
))
[ A BB
0 1 foo.bar
1 2 foo.bar
2 3 foo.foo,
A BB
3 4 foo.bar
4 5 foo.bar
5 6 foo.foo]
CodePudding user response:
You can
- get the rows where
df.BB
equals'foo.foo'
- shift that by one row
- apply cumulative sum to that and
- group by the resulting indices.
You end up with a groupby
object that you can turn into a list of sub-dfs.
>>> groups = df.groupby(df.BB.eq('foo.foo').shift(fill_value=0).cumsum())
>>> frames = [frame for _, frame in groups]
>>> frames # list of sub-dfs
[ A BB
0 1 foo.bar
1 2 foo.bar
2 3 foo.foo,
A BB
3 4 foo.bar
4 5 foo.bar
5 6 foo.foo]
CodePudding user response:
Is it what you expect:
m = len(df) // 2
df1, df2 = df.iloc[:m], df.iloc[m:]
Output:
>>> df1
A BB
0 1 foo.bar
1 2 foo.bar
2 3 foo.foo
>>> df2
A BB
3 4 foo.bar
4 5 foo.bar
5 6 foo.foo
Or use np.split
df1, df2 = np.split(df, 2)