I have this pandas dataframe.
technologies = [
("Spark", 22000,'30days',1000.0, 'Scala'),
("PySpark",25000,'50days',2300.0, 'Python'),
("Hadoop",23000,'55days',np.nan,np.nan)
]
df = pd.DataFrame(technologies,columns = ['Courses','Fee','Duration','Discount', 'Language'])
print(df)
Courses Fee Duration Discount Language
0 Spark 22000 30days 1000.0 Scala
1 PySpark 25000 50days 2300.0 Python
2 Hadoop 23000 55days NaN NaN
I am interested to convert every row into a dict.
def convert_to_dict(row) -> dict:
result = dict(row)
final_result = {k:v for k, v in result.items() if v is not np.nan}
print(final_result)
So i use the above function and this trick
df.apply(lambda row: convert_to_dict(row), axis=1)
But the result i get is weird.
{'Courses': 'Spark', 'Fee': 22000, 'Duration': '30days', 'Discount': 1000.0, 'Language': 'Scala'}
{'Courses': 'PySpark', 'Fee': 25000, 'Duration': '50days', 'Discount': 2300.0, 'Language': 'Python'}
{'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days', 'Discount': nan}
The last row had Language and Discount both as Nan.
And i expected that both should have been filtered out but i see only Language is filtered out.
How do i filter out all columns from the final result which are nan to filter out please ?
CodePudding user response:
Use notna
for filtering missing values:
final_result = {k:v for k, v in result.items() if pd.notna(v)}
final_result = [{k:v for k, v in result.items() if pd.notna(v)}
for result in df.to_dict('records')]
print(final_result)
[{'Courses': 'Spark', 'Fee': 22000, 'Duration': '30days', 'Discount': 1000.0, 'Language': 'Scala'},
{'Courses': 'PySpark', 'Fee': 25000, 'Duration': '50days', 'Discount': 2300.0, 'Language': 'Python'},
{'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days'}]
CodePudding user response:
You can use .to_dict('records')
and filter nan
with pandas.notna()
:
>>> [{k:v for k,v in dct.items() if pd.notna(v)} for dct in df.to_dict('records')]
[{'Courses': 'Spark',
'Fee': 22000,
'Duration': '30days',
'Discount': 1000.0,
'Language': 'Scala'},
{'Courses': 'PySpark',
'Fee': 25000,
'Duration': '50days',
'Discount': 2300.0,
'Language': 'Python'},
{'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days'}]