Home > Back-end >  Python/DataFrame: Count Unique Words in Each Column Cell (Not Counting Same Words in the Same Column
Python/DataFrame: Count Unique Words in Each Column Cell (Not Counting Same Words in the Same Column

Time:04-07

I thought this is going to be easy, but I'm having an issue finding an answer.

I want to count unique words in each column cell. If the same word repeats in the same cell, I want to count it only once.

i.e.)

1st: "I waited and waited and eventually left the hospital"

2nd: "I waited only 1 hour. My experience wasn't so bad"

What I want:

  • waited: 2 ( even though there were 2 "waited"s in the first column cell, I want to count only once since it's the same, so total 2 - one from 1st, one from 2nd)
  • hospital: 1
  • experience:1 so on...

I tried this code

Reviews_Freq_Words=Reviews.ReviewText2.apply(lambda x: pd.value_counts(x.split(" "))).sum(axis = 0)

Any thoughts?

CodePudding user response:

I came up with two different methods, performance-wise I'm not clear on which one is better but you can try them out for yourself.

Reviews_Freq_Words = Reviews.ReviewText2.apply(lambda x: pd.value_counts(list(set(x.split(" "))))).sum(axis = 0)
Reviews_Freq_Words = Reviews.ReviewText2.apply(lambda x: pd.value_counts(pd.unique(x.split()))).sum(axis = 0)

CodePudding user response:

If I'm understanding correctly, does each column cell hold a sentence?

I'm new to pandas too so just tried it out. This worked for me:

import pandas as pd

data = ["I waited and waited and eventually left the hospital","I waited only 1 hour. My experience wasn't so bad"]
df = pd.DataFrame(data, columns=['sentences'])

result = df['sentences'].apply(lambda x: list(set(x.split(' ')))).explode().value_counts()
  • Related