I have the following code that I have created from running some analysis and I have put the results in a defaultdict(list). Afterwards I put the results into a csv file. First, Id like to remove the items that contain 'nan' values in Check2
How would I remove the values inside of the list of dicts?
from numpy import nan
from collections import defaultdict
d = defaultdict(list,
{'Address_1': [{'Name': 'name',
'Address_match': 'address_match_1',
'ID': 'id',
'Type': 'abc',
'Check1' : 8,
'Check2' : 1},
{'Name': 'name',
'Address_match': 'address_match_2',
'ID': 'id',
'Type': 'abc',
'Check1' : 20,
'Check2' : nan},
{'Name': 'name',
'Address_match': 'address_match_3',
'ID': 'id',
'Type': 'abc',
'Check1' : 27,
'Check2' : nan}],
'Address_2': [{'Name': 'name',
'Address_match': 'address_match_1',
'ID': 'id',
'Type': 'abc',
'Check1' : 30,
'Check2' : 1},
{'Name': 'name',
'Address_match': 'address_match_2',
'ID': 'id',
'Type': 'abc',
'Check1' : 38,
'Check2' : nan},
{'Name': 'name',
'Address_match': 'address_match_3',
'ID': 'id',
'Type': 'abc',
'Check1' : 12,
'Check2' : nan}]})
Afterwards my results should be:
d = defaultdict(list,
{'Address_1': [{'Name': 'name',
'Address_match': 'address_match_1',
'ID': 'id',
'Type': 'abc',
'Check1' : 8,
'Check2' : 1}],
'Address_2': [{'Name': 'name',
'Address_match': 'address_match_1',
'ID': 'id',
'Type': 'abc',
'Check1' : 30,
'Check2' : 1}
]})
CodePudding user response:
You can do something like this:
import math
def remove_nan_att(d, att):
return {key: [o for o in d[key] if not math.isnan(o[att])] for key in d}
d = remove_nan_att(d, 'Check2')
Go over the dict, and for each key, go over its list and filter nan values by the wanted attribute.
In case nan
is from numpy:
from numpy import nan
def remove_nan_att(d, att):
return {key: [o for o in d[key] if not o[att] is nan] for key in d}
d = remove_nan_att(d, 'Check2')
CodePudding user response:
Try:
df = pd.DataFrame.from_records(d).unstack()
d = df[df.str['Check2'].notna()].unstack(level=0).to_dict('list')
print(d)
# Output:
{'Address_1': [{'Name': 'name',
'Address_match': 'address_match_1',
'ID': 'id',
'Type': 'abc',
'Check1': 8,
'Check2': 1}],
'Address_2': [{'Name': 'name',
'Address_match': 'address_match_1',
'ID': 'id',
'Type': 'abc',
'Check1': 30,
'Check2': 1}]}