I thought this is going to be easy, but I'm having an issue finding an answer.
I want to count unique words in each column cell. If the same word repeats in the same cell, I want to count it only once.
i.e.)
1st: "I waited and waited and eventually left the hospital"
2nd: "I waited only 1 hour. My experience wasn't so bad"
What I want:
- waited: 2 ( even though there were 2 "waited"s in the first column cell, I want to count only once since it's the same, so total 2 - one from 1st, one from 2nd)
- hospital: 1
- experience:1 so on...
I tried this code
Reviews_Freq_Words=Reviews.ReviewText2.apply(lambda x: pd.value_counts(x.split(" "))).sum(axis = 0)
Any thoughts?
CodePudding user response:
I came up with two different methods, performance-wise I'm not clear on which one is better but you can try them out for yourself.
Reviews_Freq_Words = Reviews.ReviewText2.apply(lambda x: pd.value_counts(list(set(x.split(" "))))).sum(axis = 0)
Reviews_Freq_Words = Reviews.ReviewText2.apply(lambda x: pd.value_counts(pd.unique(x.split()))).sum(axis = 0)
CodePudding user response:
If I'm understanding correctly, does each column cell hold a sentence?
I'm new to pandas too so just tried it out. This worked for me:
import pandas as pd
data = ["I waited and waited and eventually left the hospital","I waited only 1 hour. My experience wasn't so bad"]
df = pd.DataFrame(data, columns=['sentences'])
result = df['sentences'].apply(lambda x: list(set(x.split(' ')))).explode().value_counts()