Home > Mobile >  apply function to pandas column based on list of dictionaries
apply function to pandas column based on list of dictionaries

Time:09-27

Imagine I have a list of dictionaries as follows:

[{'source_id': '1', 'source_desc': 'XML1'},
 {'source_id': '2', 'source_desc': 'XML2'},
 {'source_id': '3', 'source_desc': 'XML3'}]

Now I have a pandas dataframe as:

di = [{'source_id': '1','city':'NY'},
       {'source_id': '2','city':'Santiago'},
       {'source_id': '3','city':'Berlin'}]
dff=pd.DataFrame(di)
dff

Now I want to add a column source_desc based on the previous collection of dictionaries. I tried:

dff['source_desc'] = dff['source_id'].astype(str).apply(lambda x: d['source_desc'] for d in di if d['source_id'] == x)

I dont understand why this does not work.

EDIT: desired output would be a pandas df like:

desired = [{'source_id': '1','city':'NY','source_desc': 'XML1' },
       {'source_id': '2','city':'Santiago','source_desc': 'XML2'},
       {'source_id': '3','city':'Berlin','source_desc': 'XML3'}]
desired_df=pd.DataFrame(desired)
desired_df

CodePudding user response:

It becomes a generator and you need next:

dff['source_desc'] = dff['source_id'].apply(lambda x: next((y['source_desc'] for y in d if y['source_id'] == x)))

>>> dff
  source_id      city source_desc
0         1        NY        XML1
1         2  Santiago        XML2
2         3    Berlin        XML3
>>> 

CodePudding user response:

Just dataframe and merge. Code below

dff.merge(pd.DataFrame([{'source_id': '1', 'source_desc': 'XML1'},
 {'source_id': '2', 'source_desc': 'XML2'},
 {'source_id': '3', 'source_desc': 'XML3'}]), how='left', on='source_id')

Output

    source_id      city source_desc
0         1        NY        XML1
1         2  Santiago        XML2
2         3    Berlin        XML3

CodePudding user response:

Better if you let pandas handle the read method, then just use a merge method.

source_json = [{'source_id': '1', 'source_desc': 'XML1'},
 {'source_id': '2', 'source_desc': 'XML2'},
 {'source_id': '3', 'source_desc': 'XML3'}]

df = dff.merge(pd.json_normalize(source_json), on=['source_id'])

  source_id      city source_desc
0         1        NY        XML1
1         2  Santiago        XML2
2         3    Berlin        XML3

CodePudding user response:

You can use merge like below:

>>> dct = [{'source_id': '1', 'source_desc': 'XML1'},{'source_id': '2', 'source_desc': 'XML2'},{'source_id': '3', 'source_desc': 'XML3'}]

>>> di = [{'source_id': '1','city':'NY'},{'source_id': '2','city':'Santiago'},{'source_id': '3','city':'Berlin'}]
>>> dff=pd.DataFrame(di)
>>> dff
    source_id   city
0          1    NY
1          2    Santiago
2          3    Berlin

>>> pd.merge(dff, pd.DataFrame(dct), on="source_id")

    source_id   city    source_desc
0          1    NY          XML1
1          2    Santiago    XML2
2          3    Berlin      XML3
  • Related