This is a dataframe sample (in reality it has more columns, :
_id | answers | extraColumn | |
---|---|---|---|
0 | a | [{'title': 'dog', 'value': 'True'}, {'title': 'cat', 'value': 'False'}, {'title': 'bird', 'value': 'False'}] | something |
1 | b | [{'title': 'food', 'value': 'False'}, {'title': 'water', 'value': 'True'}, {'title': 'wine', 'value': 'False'}] | nothing |
2 | c | [] | [] |
3 | d | [] | 22 |
I want to add an extra column that represents the total string length of the keys. So for the first row it would be 10 ("dog" "cat" "bird"), then 13, then 0, then 0.
I tried parsed = df.groupby('_id').answers.apply(lambda x: pd.DataFrame(df.values[0])).reset_index()
but it completely messed up my dataset and parsed my extra column instead, somehow. I was thinking of just creating an extra dataframe out of these dictionaries, and calculating the string length as simple as df['Length']=df['title'].str.len()
. Is that possible?
CodePudding user response:
Try:
df = df.join(df['answers'].explode().apply(pd.Series)['title'])
df['extraColumn'] = df.groupby('_id')['title'].transform(lambda x: len(x.str.cat(sep='')))
df.drop_duplicates(subset=['_id'], inplace=True)
CodePudding user response:
A simple list comprehension would suffice
df['answers'].map(lambda l: sum([len(d['title']) for d in l]))
0 10
1 13
2 0
3 0
Name: answers, dtype: int64