Home > Enterprise >  Removing None values from DataFrame in Python
Removing None values from DataFrame in Python

Time:10-25

Having the following dataframe:

name aaa bbb
Mick None None
Ivan A C
Ivan-Peter 1 None
Juli 1 P

I want to get two dataframes.

  • One with values, where we have None in columns aaa and/or bbb, named filter_nulls in my code
  • One where we do not have None at all. df_out in my code.

This is what I have tried and it does not produce the required dataframes.

import pandas as pd

df_out = {
    'name': [ 'Mick', 'Ivan', 'Ivan-Peter', 'Juli'],
    'aaa': [None, 'A', '1', '1'],
    'bbb': [None, 'C', None, 'P'],
}
print(df_out)

filter_nulls = df_out[df_out['aaa'].isnull()|(df_out['bbb'] is None)]
print(filter_nulls)

df_out = df_out.loc[filter_nulls].reset_index(level=0, drop=True)
print(df_out)

CodePudding user response:

Use:

#DataFrame from sample data
df_out = pd.DataFrame(df_out)

#filter columns names by list and test if NaN or None at least in one row
m = df_out[['aaa','bbb']].isna().any(axis=1)

#OR test both columns separately
m = df_out['aaa'].isna() | df_out['bbb'].isna()


#filter matched and not matched rows
df1 = df_out[m].reset_index(drop=True)
df2 = df_out[~m].reset_index(drop=True)
print (df1)
         name   aaa   bbb
0        Mick  None  None
1  Ivan-Peter     1  None

print (df2)
   name aaa bbb
0  Ivan   A   C
1  Juli   1   P

Another idea with DataFrame.dropna and filter indices not exist in df2:

df2 = df_out.dropna()
df1 = df_out.loc[df_out.index.difference(df2.index)].reset_index(drop=True)
df2 = df2.reset_index(drop=True)

CodePudding user response:

First of all one needs to convert df_out to a dataframe with pandas.DataFrame as follows

df_out = pd.DataFrame(df_out)

[Out]:

         name   aaa   bbb
0        Mick  None  None
1        Ivan     A     C
2  Ivan-Peter     1  None
3        Juli     1     P

Then one can use, for both cases, pandas.Series.notnull.

With values, where we have None in columns aaa and/or bbb, named filter_nulls in my code

df1 = df_out[~df_out['aaa'].notnull() | ~df_out['bbb'].notnull()]

[Out]:

         name   aaa   bbb
0        Mick  None  None
2  Ivan-Peter     1  None

Where we do not have None at all. df_out in my code.

df2 = df_out[df_out['aaa'].notnull() & df_out['bbb'].notnull()]

[Out]:

   name aaa bbb
1  Ivan   A   C
3  Juli   1   P

Notes:

  • If needed one can use pandas.DataFrame.reset_index to get the following

    df_new = df_out[~df_out['aaa'].notnull() | ~df_out['bbb'].notnull()].reset_index(drop=True)
    
    [Out]:
    
             name   aaa   bbb
    0        Mick  None  None
    1  Ivan-Peter     1  None
    
  • Related