Creating a new Dataframe based on rows with certain values and removing the rows from the original D-CodePudding

I try to separate a Dataframe based on rows with a certain value in multiple columns, so that the original Dataframe is split in two with all rows containing the value in one Dataframe and the other Dataframe with the residual rows.

df = pd.DataFrame(np.random.randint(-1,100,size=(100, 4)), columns=list('ABCD'))

df
    A   B   C   D
0   51  86  15  80
1   61  53  75  66
2   80  48  23  58
3   86  25  37  99
4   50  11  87  71
... ... ... ... ...
95  34  40  43  40
96  89  16  83  72
97  97  32  24  26
98  27  83  75  29
99  24  50  40  43
100 rows × 4 columns

df[~df.isin([-1])].dropna()
    A   B   C   D
0   51  86  15  80.0
1   61  53  75  66.0
2   80  48  23  58.0
3   86  25  37  99.0
4   50  11  87  71.0
... ... ... ... ...
95  34  40  43  40.0
96  89  16  83  72.0
97  97  32  24  26.0
98  27  83  75  29.0
99  24  50  40  43.0
98 rows × 4 columns

df[df.isin([-1])].dropna()
A   B   C   D

is what i tried so far and the first part worked correctly. However df[df.isin([-1])].dropna() failed.

CodePudding user response：

Possible solution is the following:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(-1,100,size=(100, 4)), columns=list('ABCD'))
df

# df1 = df[~df.isin([80])].dropna().reset_index(drop=True)
# or
df1 = df[~df.eq(80).any(1)].reset_index(drop=True)
df1

df2 = df[df.eq(80).any(1)].reset_index(drop=True)
df2

CodePudding user response：

Your code is almost correct. Use any(axis=1) to keep only one boolean value for each row instead of using dropna(how='all')

The same with a reproducible example:

import pandas as pd
import numpy as np

np.random.seed(2022)
vals = np.random.choice([-1, 0, 1], size=(10, 4), p=[.2, .4, .4])
df = pd.DataFrame(vals, columns=list('ABCD'))

m = df.isin([-1]).any(axis=1)  # or df.eq(-1).any(axis=1)
df1, df2 = df[m], df[~m]

Output:

>>> df.assign(M=m)
   A  B  C  D      M
0 -1  0 -1 -1   True
1  1  0  1  1  False
2  1  1  1  1  False
3  1  1  0  0  False
4  0  1  1 -1   True
5  1  0  0  1  False
6 -1  0  1  0   True
7  0  0  0  0  False
8  1 -1  1  0   True
9  1  1  0  1  False

>>> df1
   A  B  C  D
0 -1  0 -1 -1
4  0  1  1 -1
6 -1  0  1  0
8  1 -1  1  0

>>> df2
   A  B  C  D
1  1  0  1  1
2  1  1  1  1
3  1  1  0  0
5  1  0  0  1
7  0  0  0  0
9  1  1  0  1