Home > Net >  How to compare cells with string formats in .csv file and return the top five highest python
How to compare cells with string formats in .csv file and return the top five highest python

Time:12-06

I am scraping a web page and getting a list of authors with their rates. saved the data in a .csv file and now would like to process the gathered data and create a top list of the most rated 5 authors.

here is how the .csv file look like: enter image description here

Here is what I have done so far:

import csv


with open ('goodreads-book.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file)

    next(csv_reader)

    with open("TopFiveRatedAuthors.csv", 'w') as new_file:

        for line in csv_reader:
            rate = line[1]
            rate = rate[19:-8]
            # print(rate)
            if (rate) > ('100,000'):
                # print (rate)
                t =line
                csv_writer = csv.writer(new_file)
                csv_writer.writerow(line)

and my question is on line:

if str(rate) > '100,000':

right now it returns some random cells, however, I would like to write a code here to compare the cells dynamically and only return the top highest rated. I am quite new to this topic and I would really appreciate any help.

CodePudding user response:

Because of the way Python compares strings, trying to compare numerical strings will not always work. Example: '10000' > '900' will return False. If you want to compare the strings, convert them to numbers with something like:

rate = rate[19:-8]
rate = int(rate.replace(',','')) #get rid of commas before conversion
if rate > 100000: #compare integers

CodePudding user response:

You could probably split the Rate columns on spaces. The 5th part should be the rating. Then remove the , and convert to an integer. For example:

import csv

with open ('goodreads-book.csv', 'r') as f_input, open('TopFiveRatedAuthors.csv', 'w', newline='') as f_output:
    csv_input = csv.reader(f_input)
    header = next(csv_input)
    csv_output = csv.writer(f_output)
    csv_output.writerow(header)     # copy header to output

    for row in csv_input:
        rating = int(row[1].split(' ')[4].replace(',', ''))
        
        if rating > 100000:
            csv_output.writerow(row)

You need to put a textual version of your CSV file into your question to allow it to be tested. If there is a problem, add print(row) to see which row it fails on and then also print(row[1].split(' ')) to see if it is splitting correctly.

For example 4.13 avg rating -- 615,027 ratings should be split into the list:

['4.13', 'avg', 'rating', '--', '615,027', 'ratings']

So [4] is needed for the number.

  • Related