Home > other >  splitting dataframe into two subsets by boolean condition in one-liner
splitting dataframe into two subsets by boolean condition in one-liner

Time:10-27

is there a way we can use when we apply a condition on a df and get two return values: one fulfilling the condition and the second with remaining part?

df_with_condition, df_without_condition = df.[some conditional actions]

CodePudding user response:

You could split the splitting command into two parts:

df_with_condition = df.[some conditional actions]
df_without_condition = df.[~(some conditional actions)]

CodePudding user response:

If the question is about performing the boolean splitting task in a single line, you can use a generator expression with groupby:

df_false, df_true = (g for k,g in df.groupby(boolean_condition))

If you want the true condition first, invert the boolean with ~:

df_true, df_false = (g for k,g in df.groupby(~boolean_condition))

Example:

df = pd.DataFrame({'A': [0,1,2,0,1], 'B': list('ABCDE')})

# split by A<1 / A≥1
df1, df2 = (g for k,g in df.groupby(df['A'].ge(1)))

print(df1)
print('---')
print(df2)

Output:

   A  B
0  0  A
3  0  D
---
   A  B
1  1  B
2  2  C
4  1  E
  • Related