How do I count the number of occurrences of each word in a .txt file and also load it into the pandas dataframe with columns name and count, also sort the dataframe on column count?
CodePudding user response:
Considering that you have in test.txt this data :
stack monkey zimbra
flow zimbra zimbra help Edit Name
Name
You can do like this :
import string
import pandas as pd
# Open the file in read mode
text = open("test.txt", "r")
# Create an empty dictionary
dic = dict()
# Loop through each line of the file
for line in text:
# Remove the leading spaces and newline character
line = line.strip()
# Convert the characters in line to
# lowercase to avoid case mismatch
line = line.lower()
# Split the line into words
words = line.split(" ")
# Iterate over each word in line
for word in words:
# Check if the word is already in dictionary
if word in dic:
# Increment count of word by 1
dic[word] = dic[word] 1
else:
# Add the word to dictionary with count 1
dic[word] = 1
#Convert dict into a dataframe
pd = pd.DataFrame(dic.items(), columns=['Name', 'Occurrence'])
print(pd)
Output :
Name Occurrence
0 stack 1
1 monkey 1
2 zimbra 3
3 flow 1
4 help 1
5 edit 1
6 name 2
CodePudding user response:
Use nltk
:
# pip install nltk
from nltk.tokenize import RegexpTokenizer
from nltk import FreqDist
import pandas as pd
text = """How do I count the number of occurrences of each word in a .txt file and also load it into the pandas dataframe with columns name and count, also sort the dataframe on column count?"""
tokenizer = RegexpTokenizer(r'\w ')
words = tokenizer.tokenize(text)
sr = pd.Series(FreqDist(words))
Output:
>>> sr
How 1
do 1
I 1
count 3
the 3
number 1
of 2
occurrences 1
each 1
word 1
in 1
a 1
txt 1
file 1
and 2
also 2
load 1
it 1
into 1
pandas 1
dataframe 2
with 1
columns 1
name 1
sort 1
on 1
column 1
dtype: int64