I've got some code my professor and I went over because idfk what I'm doing anymore. Our project was to write a script that would count how many times words appeared in a text file. He was really helpful walking me through the code and explaining it, but when we ran it, the script wasn't accounting for words that had only appeared once. How do I make it so that the words that only appear once have a count of 1, while the rest of the words keep their same count?
ie what we got:
the: 3
history: 0
learning: 4
vs what we needed:
the: 3
history: 1
learning: 4
My most obvious answer is to simply do count = 1, but that also bumps up the rest of the numbers. I'm assuming this has something to do with elif statements?
Here's the code:
def get_file_name(title):
file_name = input(title)
return file_name
def read_file_contents(file_name):
#The purpose of this function is to read the contents of the
#file that contains the words
#First thing that we need to do is to create a Python list to
#hold the words that are in the file
word_list = []
#We must always assume that the user can make a typo when
#entering a file name. Having a program
#"crash" is definitely not user-friendly, so we encase the
#reading of the file within a try/except block
#so that if an error on opening the file occcurs, we can
#"end our program" very gracefully with
#an error message
try:
with open(file_name,'r') as input_file:
#we will red the file, one line at a time,
#until we have read all of the lines in the file.
#As we read each line, we will first remove the
#"new line character" that is at the end of each
#line. Then, we will split the line into words
#(words are preceeded and followed by spaces), and
#then appended each word to our list of words
for line in input_file:
line = line.rstrip()
words = line.split(' ')
for word in words:
word_list.append(word)
except:
# If the file cannot be found,
#then an "error message" is printed and it quits the program
print("A major error has occurred!")
quit()
#Now, we need to return the list of words so
#that we can do the counting.
return word_list
def establish_word_frequency(word_list):
word_list.sort()
#create a list that will contain the words and their count.
#This list will hold strings.
#set a count variable to 0
frequency = []
count = 0
#set a previous word variable to be the empty string
prev_word = ''
#For each word in the word_list
for i in range(len(word_list)):
#see if the current word is the same as the previous word
if word_list[i] == prev_word:
#If it is, add one to the current count
count = 1
#otherwise (meaning the word is different)
else:
#concatenate the current word with a
#space, colon, space and the string equivalent of the
#integer count (use str() for this)
#Append the word and its count to the frequency list
frequency.append(word_list[i] ' : ' str(count))
#set the count back to zero
count = 0
prev_word = word_list[i]
#At this point, all of the words have been counted and
#appended to the list
#return the sorted list
frequency.sort()
return frequency
def write_word_list(file_name, word_list):
with open(file_name, 'w') as out_file:
#write each element of the word_list to the file
for word in word_list:
out_file.write(f'{word}\n')
def main():
#get the filename by calling get_file_name
#and put the value into a variable
name_of_file = get_file_name('Which file do you want to analyze? ')
#Pass the file name variable to the read_file_contents
#function and put the result into a variable
#that will contain the file contents
file_contents = read_file_contents(name_of_file)
#Pass the file contents list to the
#establish_word_frequency function
#and put the result into a variable
results = establish_word_frequency(file_contents)
#get the filename by calling get_file_name
#file and put the value into a variable
output_file = get_file_name('What is the name of the output file? ')
#Pass the file name and the word frequency list to
#the write_word_list function
write_word_list(output_file, results)
if __name__ == '__main__':
main()
here's a snippet of words.txt:
the
college
learning
the
process
and
history
of
papermaking
participating
the
workshop
the
history
CodePudding user response:
Instead of debugging your code, please allow me to suggest a different approach.
Sorting the list of words in order to count them is inefficient. Sorting takes O(nlogn), where n is the number of items in the list (i.e., its length).
On the other hand, if you just iterate the list and count the words using a dictionary, which is basically a hash table, meaning each operation takes O(1) time, you reduce the total time complexity to just O(n).
Here is the code:
def establish_word_frequency(word_list):
freq = {}
for word in word_list:
if word in freq:
freq[word] = 1
else:
freq[word] = 1
return freq
I think this is nice and simple, but you can also just use built-in Counter
which is probably implemented similarly:
from collections import Counter
def establish_word_frequency(word_list):
return Counter(word_list)
CodePudding user response:
Solution provided by @Orius is more efficient and readable but since you're not supposed to use dictionaries so I'm making some corrections to your version of code.
def get_file_name(title):
file_name = input(title)
return file_name
def read_file_contents(file_name):
#The purpose of this function is to read the contents of the
#file that contains the words
#First thing that we need to do is to create a Python list to
#hold the words that are in the file
word_list = []
#We must always assume that the user can make a typo when
#entering a file name. Having a program
#"crash" is definitely not user-friendly, so we encase the
#reading of the file within a try/except block
#so that if an error on opening the file occcurs, we can
#"end our program" very gracefully with
#an error message
try:
with open(file_name,'r') as input_file:
#we will red the file, one line at a time,
#until we have read all of the lines in the file.
#As we read each line, we will first remove the
#"new line character" that is at the end of each
#line. Then, we will split the line into words
#(words are preceeded and followed by spaces), and
#then appended each word to our list of words
for line in input_file:
line = line.rstrip()
words = line.split(' ')
for word in words:
word_list.append(word)
except:
# If the file cannot be found,
#then an "error message" is printed and it quits the program
print("A major error has occurred!")
quit()
#Now, we need to return the list of words so
#that we can do the counting.
return word_list
def establish_word_frequency(word_list):
word_list.sort()
print(word_list)
#create a list that will contain the words and their count.
#This list will hold strings.
#set a count variable to 0
frequency = []
count = 1 #initialise count with 1 since least count of every word is 1.
#set a current word variable to be the empty string
cur_word = ''
#For each word in the word_list
for i in range(len(word_list)):
#see if the current word is the same as the next word
if word_list[i] == cur_word:
#If it is, add one to the current count
count = 1
#otherwise (meaning the word is different)
else:
#concatenate the current word with a
#space, colon, space and the string equivalent of the
#integer count (use str() for this)
#Append the word and its count to the frequency list
#this is count of word_list[i-1]
frequency.append(word_list[i-1] ' : ' str(count))
#set the count back to 1
count = 1
cur_word = word_list[i] #assign last checked word as current word
#At this point, all of the words have been counted and
#appended to the list
#return the sorted list
frequency.sort()
return frequency
def write_word_list(file_name, word_list):
with open(file_name, 'w') as out_file:
#write each element of the word_list to the file
for word in word_list:
out_file.write(f'{word}\n')
def main():
#get the filename by calling get_file_name
#and put the value into a variable
name_of_file = get_file_name('Which file do you want to analyze? ')
#Pass the file name variable to the read_file_contents
#function and put the result into a variable
#that will contain the file contents
file_contents = read_file_contents(name_of_file)
#Pass the file contents list to the
#establish_word_frequency function
#and put the result into a variable
results = establish_word_frequency(file_contents)
#get the filename by calling get_file_name
#file and put the value into a variable
output_file = get_file_name('What is the name of the output file? ')
#Pass the file name and the word frequency list to
#the write_word_list function
write_word_list(output_file, results)
if __name__ == '__main__':
main()
There were some mistakes in your code-
- Count was initialised with 0 which is wrong because count for every word is at least 1 and not 0.
- Coupling of word count and word was incorrect where you're appending frequency.
I'll suggest you to rewrite your own code after understanding where you made mistakes.