I have a pandas DataFrame looking something like this:
| ID | col1 | col2 | col3 |
| -------- | ---- | ---- | ------------|
| ID1 | val1 | val1 | [{..},{..}] |
| ID2 | val2 | val2 | [{..},{..}] |
the rows of col3 consist of a list of dicts looking something like this:
[{'key1':'3', 'key2':'8', 'key3':'9'},
{'key1':'3', 'key2':'8', 'key3':'7'},
{'key1':'2', 'key2':'1', 'key3':'9'}]
The keys within all dicts are the 'same' and I'm trying to add a new column "col4" to my df with a count of e.g. how often the number 8 as value apears in each dict with key2 as key, similar like:
| ID | col1 | col2 | col3 | col4 |
| -------- | ---- | ---- | ------------| ---- |
| ID1 | val1 | val1 | [{..},{..}] | 2 |
| ID2 | val2 | val2 | [{..},{..}] | 5 |
I don't have any idea how to even start solving this problem, how can I iterate over a list of dicts inside a pandas dataframe?
CodePudding user response:
Try using a combination of explode
, pd.json_normalize
, unstack
, and value_counts
:
val = '8'
df['col4'] = df['a'].explode().pipe(lambda x: pd.json_normalize(x).set_index(x.index)).groupby(level=0).apply(lambda g: g.unstack().value_counts()[val])
But first you might need to convert the data in your column to actually dicts/lists, because they might just be strings that look look dicts/lists:
import ast
df['col4'] = df['col4'].apply(ast.literal_eval)