I have a dataframe and a sample of it looks like this
review_id ngram date rating attraction indo
4 bigram 2021 10 uss sangat lengkap
359 bigram 2019 10 uss sangat lengkap
911 bigram 2018 10 uss sangat lengkap
977 bigram 2018 10 uss sangat lengkap
1062 bigram 2019 10 uss agak bingung
2919 bigram 2019 9 uss agak bingung
3531 bigram 2018 10 uss sangat lengkap
4282 bigram 2019 10 sea_aquarium sangat lengkap
I would like to extract the review_id into a list for each word in indo column such that the output would be something like this
I tried the following code but it does not work as it returns the review_id of all counts that are more that one which may or may not be the same words in the indo column.
df_sentiment['count'] = df_sentiment['indo'].value_counts()
def get_all_review_id():
all_review_id = []
for i in range(len(df_sentiment)):
if df_sentiment['count'][i] > 1:
all_review_id.append(df_sentiment['review_id'][i])
return all_review_id
df_sentiment["all_review_id"] = df_sentiment['indo'].progress_apply(lambda x: get_all_review_id(x))
Any suggestions / code I can use? Thank you!
CodePudding user response:
if you share the data, I can reproduce and add the result
This hopefully will answer your question
df.groupby(['ngram','date','rating','attraction','indo'])['review_id'].agg(list).reset_index()
ngram date rating attraction indo review_id
0 bigram 2018 10 uss sangat lengkap [911, 977, 3531]
1 bigram 2019 9 uss agak bingung [2919]
2 bigram 2019 10 sea_aquarium sangat blengkap [4282]
3 bigram 2019 10 uss agak bingung [1062]
4 bigram 2019 10 uss sangat lengkap [359]
5 bigram 2021 10 uss sangat lengkap [4]