I'm trying to sort data corresponding to a list I made based on a csv in python-CodePudding

I have a csv file with the columns: Name, Height, City Now I need to return all the heights corresponding to similar cities. So I have created a variable for all unique cities:

uniqueCity = []
for i in city:
    if i not in uniqueCity:
        uniqueCity.append(i)

I am able to print all heights corresponding to each city, but I cant seem to sort them on the height value per city

def printCity(city):
for i in uniqueCity:
    print(i)
    for j in range(len(city)):
        if i == city[j]:
            print(name[j], height[j])

What am I missing?

I am not allowed to use any third party libraries.

Full code:

import csv
with open('heightData.csv', 'r') as csvfile:
    csvreader = csv.reader(csvfile)
    next(csvreader)


name = []
city = []
height = []
for row in csvreader:
    name.append(row[0])
    city.append(row[1])
    height.append(int(row[2]))


city.sort()

uniqueCity = []
for i in city:
    if i not in uniqueCity:
        uniqueCity.append(i)

def printCity(city):
    for i in uniqueCity:
        print(i)
        for j in range(len(city)):
            if i == city[j]:
                print(name[j], height[j])
printCity(city)

Sample data:

name,city,height
Mariam Cox,St_Paul,67
Daniel Ashley,St_Paul,65
Oliver Clay,Minneapolis,75
Rae Finley,Minneapolis,81
Brady Joyce,Virginia,68
Harding Jones,Virginia,80

Expected output:

Minneapolis:
Oliver Clay 75
Rae Finley 81
St_Paul:
Daniel Ashley 65
Mariam Cox 67
Virginia:
Brady Joyce 68
Harding Jones 80

CodePudding user response：

The problem is, once you separated the data into separate lists for each column, there's nothing connecting the same row for each column. Then, when you do city.sort(), the other columns don't also get sorted, and now you have the city column out of order with respect to the others.

Instead, you could put each row into a tuple, and add all tuples to a list. Then sort() that list using the key argument to select any column (in this case, select the [2] item of each row to sort by height:

with open('heightData.csv', 'r') as csvfile:
    csvreader = csv.reader(csvfile)
    next(csvreader)
    csvdata = []
    for row in csvreader:
        row[2] = int(row[2])
        csvdata.append(tuple(row))

csvdata.sort(key=lambda row: row[2])

Which gives:

csvdata = [('Mariam Cox', 'St_Paul', 67),
 ('Brady Joyce', 'Virginia', 68),
 ('Oliver Clay', 'Minneapolis', 75),
 ('Harding Jones', 'Virginia', 80)]

From your edit, I see that you want to first group your data by city, and then print the names of people, sorted by their heights. You have two options to group your data:

Sort by city and then use python's builtin itertools.groupby()

import itertools

csvdata.sort(key=lambda row: row[1]) # Sort by city
grouped_rows = {k: list(v) for  k, v in itertools.groupby(csvdata, key=lambda row: row[1])} # Group by city

Create a dictionary where the keys are cities and the values are lists of rows belonging to that city.

import collections

grouped_rows = collections.defaultdict(list)
for row in csvdata:
    city = row[1]
    grouped_rows[city].append(row)

Then, you can iterate over either of these grouped_rows objects, sort the lists within on the [2] item, and print them:

for city in sorted(grouped_rows.keys()):
    city_rows = sorted(grouped_rows[city], key=lambda row: row[2])
    print(city)
    for row in city_rows:
        print("\t", row[0], row[2])

Minneapolis
     Oliver Clay 75
St_Paul
     Mariam Cox 67
Virginia
     Brady Joyce 68
     Harding Jones 80

CodePudding user response：

For the assignment, it had to be a function. But this seems to work for me.

#create tuple of all heights corresponding to each city
def heightTuple(city):
    cityHeight = collections.defaultdict(list)
    for i in range(len(city)):
        cityHeight[city[i]].append(height[i])
    for i in cityHeight:
        cityHeight[i].sort()
    print(cityHeight)