Home > Software design >  Split pandas dataframe into multiple dataframes with list of lists as mask
Split pandas dataframe into multiple dataframes with list of lists as mask

Time:04-01

I have a pandas dataframe tat looks something like this

A BB
1 foo.bar
2 foo.bar
3 foo.foo
4 foo.bar
5 foo.bar
6 foo.foo

I basically expect to get two dataframes out of them based on this list of lists:

[[False, False, True], [False, False, True]]

OUTPUT should be:

df1:

A BB
1 foo.bar
2 foo.bar
3 foo.foo

df2

A BB
4 foo.bar
5 foo.bar
6 foo.foo

CodePudding user response:

Numpy:

  • flatnonzero to find where the 'foo.foo' rows are
  • split to divide the dataframe up accordingly

import numpy as np

np.split(df, np.flatnonzero(df.BB.eq('foo.foo'))[:-1]   1)

[   A       BB
 0  1  foo.bar
 1  2  foo.bar
 2  3  foo.foo,
    A       BB
 3  4  foo.bar
 4  5  foo.bar
 5  6  foo.foo]

Addressing @mozway's comment

list(filter(
    lambda d: not d.empty,
    np.split(df, np.flatnonzero(df.BB.eq('foo.foo'))   1)
))

[   A       BB
 0  1  foo.bar
 1  2  foo.bar
 2  3  foo.foo,
    A       BB
 3  4  foo.bar
 4  5  foo.bar
 5  6  foo.foo]

CodePudding user response:

You can

  • get the rows where df.BB equals 'foo.foo'
  • shift that by one row
  • apply cumulative sum to that and
  • group by the resulting indices.

You end up with a groupby object that you can turn into a list of sub-dfs.

>>> groups = df.groupby(df.BB.eq('foo.foo').shift(fill_value=0).cumsum())
>>> frames = [frame for _, frame in groups]
>>> frames # list of sub-dfs
[   A       BB
 0  1  foo.bar
 1  2  foo.bar
 2  3  foo.foo,
    A       BB
 3  4  foo.bar
 4  5  foo.bar
 5  6  foo.foo]

CodePudding user response:

Is it what you expect:

m = len(df) // 2
df1, df2 = df.iloc[:m], df.iloc[m:]

Output:

>>> df1
   A       BB
0  1  foo.bar
1  2  foo.bar
2  3  foo.foo

>>> df2
   A       BB
3  4  foo.bar
4  5  foo.bar
5  6  foo.foo

Or use np.split

df1, df2 = np.split(df, 2)
  • Related