Home > Blockchain >  Getting unique values from csv file, output to new file
Getting unique values from csv file, output to new file

Time:12-02

I am trying to get the unique values from a csv file. Here's an example of the file:

12,life,car,good,exellent
10,gift,truck,great,great
11,time,car,great,perfect

The desired output in the new file is this:

12,10,11
life,gift,time
car,truck
good.great
excellent,great,perfect

Here is my code:

def attribute_values(in_file, out_file):
    fname = open(in_file)
    fout = open(out_file, 'w')

    # get the header line
    header = fname.readline()
    # get the attribute names
    attrs = header.strip().split(',')

    # get the distinct values for each attribute
    values = []
    
    for i in range(len(attrs)):
        values.append(set())

    # read the data
    for line in fname:
        cols = line.strip().split(',')
        
        for i in range(len(attrs)):
            values[i].add(cols[i])

        # write the distinct values to the file
        for i in range(len(attrs)):
            fout.write(attrs[i]   ','   ','.join(list(values[i]))   '\n')

    fout.close()
    fname.close()

The code currently outputs this:

12,10
life,gift
car,truck
good,great
exellent,great
12,10,11
life,gift,time
car,car,truck
good,great
exellent,great,perfect

How can I fix this?

CodePudding user response:

You could try to use zip to iterate over the columns of the input file, and then eliminate the duplicates:

import csv

def attribute_values(in_file, out_file):
    with open(in_file, "r") as fin, open(out_file, "w") as fout:
        for column in zip(*csv.reader(fin)):
            items, row = set(), []
            for item in column:
                if item not in items:
                    items.add(item)
                    row.append(item)
            fout.write(",".join(row)   "\n")

Result for the example file:

12,10,11
life,gift,time
car,truck
good,great
exellent,great,perfect
  • Related