Home > Mobile >  Filter Nulls when converting pandas dataframe to dict
Filter Nulls when converting pandas dataframe to dict

Time:07-20

I have this pandas dataframe.

 technologies = [
 ("Spark", 22000,'30days',1000.0, 'Scala'),
         ("PySpark",25000,'50days',2300.0, 'Python'),
 ("Hadoop",23000,'55days',np.nan,np.nan)
 ]
 df = pd.DataFrame(technologies,columns = ['Courses','Fee','Duration','Discount', 'Language'])
 print(df)

   Courses    Fee Duration  Discount Language
0    Spark  22000   30days    1000.0    Scala
1  PySpark  25000   50days    2300.0   Python
2   Hadoop  23000   55days       NaN      NaN

I am interested to convert every row into a dict.

def convert_to_dict(row) -> dict:
    result = dict(row)
    final_result = {k:v for k, v in result.items() if v is not np.nan}
    print(final_result)

So i use the above function and this trick

df.apply(lambda row: convert_to_dict(row), axis=1)

But the result i get is weird.

{'Courses': 'Spark', 'Fee': 22000, 'Duration': '30days', 'Discount': 1000.0, 'Language': 'Scala'}
{'Courses': 'PySpark', 'Fee': 25000, 'Duration': '50days', 'Discount': 2300.0, 'Language': 'Python'}
{'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days', 'Discount': nan}

The last row had Language and Discount both as Nan.

And i expected that both should have been filtered out but i see only Language is filtered out.

How do i filter out all columns from the final result which are nan to filter out please ?

CodePudding user response:

Use notna for filtering missing values:

final_result = {k:v for k, v in result.items() if pd.notna(v)}

final_result = [{k:v for k, v in result.items() if pd.notna(v)} 
                for result in df.to_dict('records')]
print(final_result)
[{'Courses': 'Spark', 'Fee': 22000, 'Duration': '30days', 'Discount': 1000.0, 'Language': 'Scala'}, 
 {'Courses': 'PySpark', 'Fee': 25000, 'Duration': '50days', 'Discount': 2300.0, 'Language': 'Python'}, 
 {'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days'}]
 

CodePudding user response:

You can use .to_dict('records') and filter nan with pandas.notna():

>>> [{k:v for k,v in dct.items() if pd.notna(v)} for dct in df.to_dict('records')]
[{'Courses': 'Spark',
  'Fee': 22000,
  'Duration': '30days',
  'Discount': 1000.0,
  'Language': 'Scala'},
 {'Courses': 'PySpark',
  'Fee': 25000,
  'Duration': '50days',
  'Discount': 2300.0,
  'Language': 'Python'},
 {'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days'}]
  • Related