I have a given text string: text = """Alice has two apples and bananas. Apples are very healty."""
and a dataframe:
word |
---|
apples |
bananas |
company |
I would like to add a column "frequency" which will count occurrences of each word in column "word" in the text.
So the output should be as below:
word | frequency |
---|---|
apples | 2 |
bananas | 1 |
company | 0 |
CodePudding user response:
import pandas as pd
df = pd.DataFrame(['apples', 'bananas', 'company'], columns=['word'])
para = "Alice has two apples and bananas. Apples are very healty.".lower()
df['frequency'] = df['word'].apply(lambda x : para.count(x.lower()))
word frequency
0 apples 2
1 bananas 1
2 company 0
CodePudding user response:
- Convert the text to lowercase and then use regex to convert it to a list of words. You might check out this page for learning purposes.
- Loop through each row in the dataset and use lambda function to count the specific value in the previously created list.
# Import and create the data
import pandas as pd
import re
text = """Alice has two apples and bananas. Apples are very healty."""
df = pd.DataFrame(data={'word':['apples','bananas','company']})
# Solution
words_list = re.findall(r'\w ', text.lower())
df['Frequency'] = df['word'].apply(lambda x: words_list.count(x))