This is as far as I got. I ended up being able to create the list of words and count but I have been working on this for legit over an hour and can't figure out how to remove the duplicates. I've tried making new lists, dict, tuples, etc. I'm ramming my head into a wall.
Write a program that first reads in the name of an input file and then reads the file using the csv.reader() method. The file contains a list of words separated by commas. Your program should output the words and their frequencies (the number of times each word appears in the file) without any duplicates.
Ex: If the input is:
input1.csv
and the contents of input1.csv are:
hello,cat,man,hey,dog,boy,Hello,man,cat,woman,dog,Cat,hey,boy
the output is:
hello 1 cat 2 man 2 hey 2 dog 2 boy 2 Hello 1 woman 1 Cat 1
Note: There is a newline at the end of the output, and input1.csv is available to download.
import csv
user_input = input()
with open(user_input, 'r') as name_CSV:
paper_copy = csv.reader(name_CSV)
for lines in paper_copy:
for w in lines:
words_cnt = lines.count(w)
print(w, words_cnt)
CodePudding user response:
a very pythonic way could be to add all the words to a list and convert the list to a set, then your done
sets only allow 1 instance of a literal, so converting a list of strings to a set {set are in curly braces like dictionaries} removes all duplicates immediately
import csv
word_list=[]
user_input = input()
with open(user_input, 'r') as name_CSV:
paper_copy = csv.reader(name_CSV)
for lines in paper_copy:
for w in lines:
word_list.append(w)
word_set = set(word_list)
CodePudding user response:
You can use a few things to unique a list
but the best thing is to just convert it to a set
(what is a set in python?).
So first load in your words with csv.reader
as you are told:
import csv
word_list = []
with open('input1.csv') as name_CSV:
paper_copy = csv.reader(name_CSV)
for line in paper_copy:
for word in line:
word_list.append(word)
Then convert it to a set
. I sorted the set
to preserve the order that the words show up in from word_list
because set
's don't innately keep any order (they are, by definition, unordered).
unique_words = sorted(set(word_list), key=word_list.index)
And then to get your output, for loop
through all of your unique words in your set
and compare them to your words in your list
while also incrementing the count of those words each time you hit them in the for loop
:
for x in unique_words:
count = 0
for y in word_list:
if x == y:
count = 1
print(x, count)
Output:
hello 1
cat 2
man 2
hey 2
dog 2
boy 2
Hello 1
woman 1
Cat 1
Or, you could do it with less lines with count()
. I still think it is a good idea to at least look at the above method and try to understand how that works.
for x in unique_words:
print(x, word_list.count(x))
Output:
hello 1
cat 2
man 2
hey 2
dog 2
boy 2
Hello 1
woman 1
Cat 1