Home > Enterprise >  7.8 LAB: Word frequencies (lists and CSV) - How can I read a csv and eliminate duplicates?
7.8 LAB: Word frequencies (lists and CSV) - How can I read a csv and eliminate duplicates?

Time:08-09

This is as far as I got. I ended up being able to create the list of words and count but I have been working on this for legit over an hour and can't figure out how to remove the duplicates. I've tried making new lists, dict, tuples, etc. I'm ramming my head into a wall.


Write a program that first reads in the name of an input file and then reads the file using the csv.reader() method. The file contains a list of words separated by commas. Your program should output the words and their frequencies (the number of times each word appears in the file) without any duplicates.

Ex: If the input is:

input1.csv

and the contents of input1.csv are:

hello,cat,man,hey,dog,boy,Hello,man,cat,woman,dog,Cat,hey,boy

the output is:

hello 1 cat 2 man 2 hey 2 dog 2 boy 2 Hello 1 woman 1 Cat 1

Note: There is a newline at the end of the output, and input1.csv is available to download.

import csv
user_input = input() 
with open(user_input, 'r') as name_CSV: 
    paper_copy = csv.reader(name_CSV)
    for lines in paper_copy:
        for w in lines: 
            words_cnt = lines.count(w)
            print(w, words_cnt)

CodePudding user response:

a very pythonic way could be to add all the words to a list and convert the list to a set, then your done

sets only allow 1 instance of a literal, so converting a list of strings to a set {set are in curly braces like dictionaries} removes all duplicates immediately

import csv

word_list=[]
user_input = input() 
with open(user_input, 'r') as name_CSV: 
    paper_copy = csv.reader(name_CSV)
    for lines in paper_copy:
        for w in lines: 
            word_list.append(w)
word_set = set(word_list)

CodePudding user response:

You can use a few things to unique a list but the best thing is to just convert it to a set (what is a set in python?).

So first load in your words with csv.reader as you are told:

import csv

word_list = []
with open('input1.csv') as name_CSV:
    paper_copy = csv.reader(name_CSV)
    for line in paper_copy:
        for word in line:
            word_list.append(word)

Then convert it to a set. I sorted the set to preserve the order that the words show up in from word_list because set's don't innately keep any order (they are, by definition, unordered).

unique_words = sorted(set(word_list), key=word_list.index)

And then to get your output, for loop through all of your unique words in your set and compare them to your words in your list while also incrementing the count of those words each time you hit them in the for loop:

for x in unique_words:
    count = 0
    for y in word_list:
        if x == y:
            count  = 1
    print(x, count)

Output:

hello 1
cat 2
man 2
hey 2
dog 2
boy 2
Hello 1
woman 1
Cat 1

Or, you could do it with less lines with count(). I still think it is a good idea to at least look at the above method and try to understand how that works.

for x in unique_words:
    print(x, word_list.count(x))

Output:

hello 1
cat 2
man 2
hey 2
dog 2
boy 2
Hello 1
woman 1
Cat 1
  • Related