I want to convert this dataframe into a dictionary where for one single label as key, I store multiple tweets as value. Can someone help?
CodePudding user response:
Assuming your data frame is variable name is "df" then below may help :
temp = df.groupby(['labels']).apply(lambda x: x['tweets'].tolist()).to_dict()
print(temp)
CodePudding user response:
To get your expected result you can run e.g.:
result = df.groupby('labels')['tweets'].apply(list).to_dict()
Details:
df.groupby('labels')
- groups source rows.['tweets']
- takes only tweets column (from each group).apply(list)
- converts tweets from the current group into a list. You don't even need to use any explicit lambda function. So far (the result of groupby and apply) is a pandasonic Series.to_dict()
- converts this Series to a dictionary.
For your source data (shortened a bit) the result is:
{'EXP': ['if you missed', 'the emotional'],
'QUE': ['the neverending'],
'STM': ['katie couric', 'a yearold nigerian']}