Having the following dataframe:
name | aaa | bbb |
---|---|---|
Mick | None | None |
Ivan | A | C |
Ivan-Peter | 1 | None |
Juli | 1 | P |
I want to get two dataframes.
- One with values, where we have None in columns
aaa
and/orbbb
, namedfilter_nulls
in my code - One where we do not have None at all.
df_out
in my code.
This is what I have tried and it does not produce the required dataframes.
import pandas as pd
df_out = {
'name': [ 'Mick', 'Ivan', 'Ivan-Peter', 'Juli'],
'aaa': [None, 'A', '1', '1'],
'bbb': [None, 'C', None, 'P'],
}
print(df_out)
filter_nulls = df_out[df_out['aaa'].isnull()|(df_out['bbb'] is None)]
print(filter_nulls)
df_out = df_out.loc[filter_nulls].reset_index(level=0, drop=True)
print(df_out)
CodePudding user response:
Use:
#DataFrame from sample data
df_out = pd.DataFrame(df_out)
#filter columns names by list and test if NaN or None at least in one row
m = df_out[['aaa','bbb']].isna().any(axis=1)
#OR test both columns separately
m = df_out['aaa'].isna() | df_out['bbb'].isna()
#filter matched and not matched rows
df1 = df_out[m].reset_index(drop=True)
df2 = df_out[~m].reset_index(drop=True)
print (df1)
name aaa bbb
0 Mick None None
1 Ivan-Peter 1 None
print (df2)
name aaa bbb
0 Ivan A C
1 Juli 1 P
Another idea with DataFrame.dropna
and filter indices not exist in df2
:
df2 = df_out.dropna()
df1 = df_out.loc[df_out.index.difference(df2.index)].reset_index(drop=True)
df2 = df2.reset_index(drop=True)
CodePudding user response:
First of all one needs to convert df_out
to a dataframe with pandas.DataFrame
as follows
df_out = pd.DataFrame(df_out)
[Out]:
name aaa bbb
0 Mick None None
1 Ivan A C
2 Ivan-Peter 1 None
3 Juli 1 P
Then one can use, for both cases, pandas.Series.notnull
.
With values, where we have None in columns aaa and/or bbb, named filter_nulls in my code
df1 = df_out[~df_out['aaa'].notnull() | ~df_out['bbb'].notnull()]
[Out]:
name aaa bbb
0 Mick None None
2 Ivan-Peter 1 None
Where we do not have None at all. df_out in my code.
df2 = df_out[df_out['aaa'].notnull() & df_out['bbb'].notnull()]
[Out]:
name aaa bbb
1 Ivan A C
3 Juli 1 P
Notes:
If needed one can use
pandas.DataFrame.reset_index
to get the followingdf_new = df_out[~df_out['aaa'].notnull() | ~df_out['bbb'].notnull()].reset_index(drop=True) [Out]: name aaa bbb 0 Mick None None 1 Ivan-Peter 1 None