How can I iterate over a list of dicts inside a pandas dataframe?-CodePudding

I have a pandas DataFrame looking something like this:

    | ID       | col1 | col2 |     col3    |
    | -------- | ---- | ---- | ------------|
    | ID1      | val1 | val1 | [{..},{..}] |
    | ID2      | val2 | val2 | [{..},{..}] |

the rows of col3 consist of a list of dicts looking something like this:

    [{'key1':'3', 'key2':'8', 'key3':'9'},
     {'key1':'3', 'key2':'8', 'key3':'7'},
     {'key1':'2', 'key2':'1', 'key3':'9'}]

The keys within all dicts are the 'same' and I'm trying to add a new column "col4" to my df with a count of e.g. how often the number 8 as value apears in each dict with key2 as key, similar like:

    | ID       | col1 | col2 |     col3    | col4 |
    | -------- | ---- | ---- | ------------| ---- |
    | ID1      | val1 | val1 | [{..},{..}] |  2   |
    | ID2      | val2 | val2 | [{..},{..}] |  5   |

I don't have any idea how to even start solving this problem, how can I iterate over a list of dicts inside a pandas dataframe?

CodePudding user response：

Try using a combination of explode, pd.json_normalize, unstack, and value_counts:

val = '8'
df['col4'] = df['a'].explode().pipe(lambda x: pd.json_normalize(x).set_index(x.index)).groupby(level=0).apply(lambda g: g.unstack().value_counts()[val])

But first you might need to convert the data in your column to actually dicts/lists, because they might just be strings that look look dicts/lists:

import ast
df['col4'] = df['col4'].apply(ast.literal_eval)