Home > Software design >  How do I sort data from a CSV by a column?
How do I sort data from a CSV by a column?

Time:01-13

I need to organise a CSV file by user ID in ascending order. The csv file has a header that I would like to keep at the top of the document. the headers are below with 13500 rows of data

User_ID;firstname;lastname;location

The code i have currently omits the headings. If I remove the heading=next(csv_reader) line, it puts the headings at the bottom of the document.

The current output does not also put them in correct order but goes off the first value of the ID and not the whole number (ID=13000 comes before ID=2000 through 9999)

import csv
import operator
file = open("file.csv", 'r')

csv_reader = csv.reader(file, delimiter=';')

heading=next(csv_reader)

sort = sorted(csv_reader, key=operator.itemgetter(0))
for eachline in sort:
    print(eachline)

CodePudding user response:

Your current sort happens in lexical order, because the elements of your CSV file are strings. If you want to sort them as integers, have your key function in the sorted call convert them to integers.

sorted_data = sorted(csv_reader, key=lambda row: int(row[0]))

I used a lambda instead of operator.itemgetter(0) because we needed to convert to an int anyway, and this is the most convenient way to do so.

To print the header with the data, print it before printing your data:

print(heading)
for line in sorted_data
    print(line)

CodePudding user response:

You can achieve it with pandas too:

import pandas as pd

df = pd.read_csv(open('file.csv'), delimiter=';')
sorted_df = df.sort_values(by=["User_ID"], ascending=True)
sorted_df.to_csv('file_sorted.csv', sep=';', index=False)

print(sorted_df.to_string(index=False))
  • Related