Imagine I have a list of dictionaries as follows:
[{'source_id': '1', 'source_desc': 'XML1'},
{'source_id': '2', 'source_desc': 'XML2'},
{'source_id': '3', 'source_desc': 'XML3'}]
Now I have a pandas dataframe as:
di = [{'source_id': '1','city':'NY'},
{'source_id': '2','city':'Santiago'},
{'source_id': '3','city':'Berlin'}]
dff=pd.DataFrame(di)
dff
Now I want to add a column source_desc based on the previous collection of dictionaries. I tried:
dff['source_desc'] = dff['source_id'].astype(str).apply(lambda x: d['source_desc'] for d in di if d['source_id'] == x)
I dont understand why this does not work.
EDIT: desired output would be a pandas df like:
desired = [{'source_id': '1','city':'NY','source_desc': 'XML1' },
{'source_id': '2','city':'Santiago','source_desc': 'XML2'},
{'source_id': '3','city':'Berlin','source_desc': 'XML3'}]
desired_df=pd.DataFrame(desired)
desired_df
CodePudding user response:
It becomes a generator and you need next
:
dff['source_desc'] = dff['source_id'].apply(lambda x: next((y['source_desc'] for y in d if y['source_id'] == x)))
>>> dff
source_id city source_desc
0 1 NY XML1
1 2 Santiago XML2
2 3 Berlin XML3
>>>
CodePudding user response:
Just dataframe and merge. Code below
dff.merge(pd.DataFrame([{'source_id': '1', 'source_desc': 'XML1'},
{'source_id': '2', 'source_desc': 'XML2'},
{'source_id': '3', 'source_desc': 'XML3'}]), how='left', on='source_id')
Output
source_id city source_desc
0 1 NY XML1
1 2 Santiago XML2
2 3 Berlin XML3
CodePudding user response:
Better if you let pandas handle the read method, then just use a merge
method.
source_json = [{'source_id': '1', 'source_desc': 'XML1'},
{'source_id': '2', 'source_desc': 'XML2'},
{'source_id': '3', 'source_desc': 'XML3'}]
df = dff.merge(pd.json_normalize(source_json), on=['source_id'])
source_id city source_desc
0 1 NY XML1
1 2 Santiago XML2
2 3 Berlin XML3
CodePudding user response:
You can use merge
like below:
>>> dct = [{'source_id': '1', 'source_desc': 'XML1'},{'source_id': '2', 'source_desc': 'XML2'},{'source_id': '3', 'source_desc': 'XML3'}]
>>> di = [{'source_id': '1','city':'NY'},{'source_id': '2','city':'Santiago'},{'source_id': '3','city':'Berlin'}]
>>> dff=pd.DataFrame(di)
>>> dff
source_id city
0 1 NY
1 2 Santiago
2 3 Berlin
>>> pd.merge(dff, pd.DataFrame(dct), on="source_id")
source_id city source_desc
0 1 NY XML1
1 2 Santiago XML2
2 3 Berlin XML3