Sorting a dictionary by value when the values are large float numbers (tried lambda and itemgetter b-CodePudding

I have a csv file with people's names and averages as below:

mandana,7.5
hamid,6.066666666666666
sina,11.285714285714286
sara,9.75
soheila,7.833333333333333
ali,5.0
sarvin,11.375

I want to sort it by the averages and write it into another file. I've tried lambda and itemgetter but I didn't get the proper result. Here is my code:

def calculate_sorted_averages(file1, file2):
with open (r'C:\Users\sony\Desktop\Python with Jadi\file1.csv', 'r') as f1:
    reader=csv.reader(f1)
    d={}
    for row in reader:
        name=row[0]
        average=row[1]
        d[name]=average
    sorted_dict=OrderedDict(sorted(d.items(), key=operator.itemgetter(1), reverse=True))
    with open (r'C:\Users\sony\Desktop\Python with Jadi\file2.csv', 'w', newline='') as f2:
        for key in sorted_dict.keys():
            writer=csv.writer(f2)
            writer.writerow([key,sorted_dict[key]])

And here is my output:

sara,9.75
soheila,7.833333333333333
mandana,7.5
hamid,6.066666666666666
ali,5.0
sarvin,11.375
sina,11.285714285714286

As you can see it is not sorted. I've tried also lambda and it didn't work. I'm now frustrated and don't know what to do. Can anyone help me? Thanks.

CodePudding user response：

You got your result because you're sorting lexicographically (comparing your floats as strings) instead of sorting by their numeric value.

All you're missing is casting the numeric value to float and you're done, and sort as usual with key=operator.itemgetter(1)

def calculate_sorted_averages(file1, file2):
    d = {}
    with open (r'path/to/unsorted.csv', 'r') as f1:
        reader=csv.reader(f1)
        for row in reader:
            name=row[0]
            average=row[1]
            d[name]=float(average)
    sorted_dict=OrderedDict(sorted(d.items(), key=operator.itemgetter(1), reverse=True))
    with open (r'path/to/sorted.csv', 'w', newline='') as f2:
        for key in sorted_dict.keys():
            writer=csv.writer(f2)
            writer.writerow([key,sorted_dict[key]])

CodePudding user response：

aaa = {'0': ['mandana', 7.5], '1': ['hamid', 6.066666666666666], '2': ['sina', 11.285714285714286], '3': ['sara', 9.75],
       '4': ['soheila', 7.833333333333333], '5': ['ali', 5.0], '6': ['sarvin', 11.375]}

sorted_ = sorted(aaa.items(), key=lambda x: x[1][1])
sorted_ = dict(sorted_)

Output

{'5': ['ali', 5.0], '1': ['hamid', 6.066666666666666], '0': ['mandana', 7.5], '4': ['soheila', 7.833333333333333], '3': ['sara', 9.75], '2': ['sina', 11.285714285714286], '6': ['sarvin', 11.375]}

You didn't show the entire dictionary with the keys. So I created my 'aaa'. Sorting takes place by the second element.

CodePudding user response：

By default, text read from a file, with or without csv.reader, is stored into strings. You need to call float on the second element of each row, to interpret it as a floating-point number.

I think using an OrderedDict is a bit overkill here. One call to sorted is enough.

import csv

def calculate_sorted_averages(filename_input, filename_output):
    with open(filename_input, 'r') as f1:
        reader=csv.reader(f1)
        sorted_rows = sorted(reader, key=lambda x: float(x[1]))
    with open(filename_output, 'w') as f2:
        writer = csv.writer(f2)
        writer.writerows(sorted_rows)

calculate_sorted_averages('file1.csv', 'file2.csv')

Results:

$ cat file1.csv 
mandana,7.5
hamid,6.066666666666666
sina,11.285714285714286
sara,9.75
soheila,7.833333333333333
ali,5.0
sarvin,11.375

$ cat file2.csv
ali,5.0
hamid,6.066666666666666
mandana,7.5
soheila,7.833333333333333
sara,9.75
sina,11.285714285714286
sarvin,11.375

CodePudding user response：

You can try the pandas module for this.

The pandas.read_csv() function would read the csv file whose path you pass in as a parameter inside the function, and would convert it into a pandas dataframe or in simpler words it would display a table inside Python.

import pandas as pd

df = pd.read_csv("C:\Users\sony\Desktop\Python with Jadi\file1.csv")
df.columns = ["Name", "Value"]    # To set the column names. Only do this if the dataframe doesn't already have a column name.
sorted_df = df.sort_values(by = "Value")    # Sorting the dataframe by the values in the "Value" column

Output -

	Name	Value
5	ali	5.0
1	hamid	6.066666666666666
0	mandana	7.5
4	soheila	7.833333333333333
3	sara	9.75
2	sina	11.285714285714286
6	sarvin	11.375

You can convert this dataframe back to a csv file using to_csv(). Pass in the file path as the parameter and set index = False if you don't want the index to be added as a column.

CodePudding user response：

Pandas can be used for this - you can install it with pip install pandas

import pandas as pd

df = pd.read_csv('filename.csv')
df.columns = ['name', 'value']
df.sort_values('value', inplace=True, ascending=True)

print(df)