Split pandas dataframe into multiple dataframes with list of lists as mask-CodePudding

I have a pandas dataframe tat looks something like this

A BB
1 foo.bar
2 foo.bar
3 foo.foo
4 foo.bar
5 foo.bar
6 foo.foo

I basically expect to get two dataframes out of them based on this list of lists:

[[False, False, True], [False, False, True]]

OUTPUT should be:

df1:

A BB
1 foo.bar
2 foo.bar
3 foo.foo

df2

A BB
4 foo.bar
5 foo.bar
6 foo.foo

CodePudding user response：

Numpy:

flatnonzero to find where the 'foo.foo' rows are
split to divide the dataframe up accordingly

import numpy as np

np.split(df, np.flatnonzero(df.BB.eq('foo.foo'))[:-1]   1)

[   A       BB
 0  1  foo.bar
 1  2  foo.bar
 2  3  foo.foo,
    A       BB
 3  4  foo.bar
 4  5  foo.bar
 5  6  foo.foo]

Addressing @mozway's comment

list(filter(
    lambda d: not d.empty,
    np.split(df, np.flatnonzero(df.BB.eq('foo.foo'))   1)
))

[   A       BB
 0  1  foo.bar
 1  2  foo.bar
 2  3  foo.foo,
    A       BB
 3  4  foo.bar
 4  5  foo.bar
 5  6  foo.foo]

CodePudding user response：

You can

get the rows where df.BB equals 'foo.foo'
shift that by one row
apply cumulative sum to that and
group by the resulting indices.

You end up with a groupby object that you can turn into a list of sub-dfs.

>>> groups = df.groupby(df.BB.eq('foo.foo').shift(fill_value=0).cumsum())
>>> frames = [frame for _, frame in groups]
>>> frames # list of sub-dfs
[   A       BB
 0  1  foo.bar
 1  2  foo.bar
 2  3  foo.foo,
    A       BB
 3  4  foo.bar
 4  5  foo.bar
 5  6  foo.foo]

CodePudding user response：

Is it what you expect:

m = len(df) // 2
df1, df2 = df.iloc[:m], df.iloc[m:]

Output:

>>> df1
   A       BB
0  1  foo.bar
1  2  foo.bar
2  3  foo.foo

>>> df2
   A       BB
3  4  foo.bar
4  5  foo.bar
5  6  foo.foo

Or use np.split

df1, df2 = np.split(df, 2)