I am trying to make a sorting system.
I have the following values in a .csv file
Dan,20,30,15
Dan,15,20,20
Dan,17,11,10
Alex,10,10,10
Alex,11,20,30
The last name along with the values should remain and the previous ones should get deleted. for example, the following two should be re-written into the .csv file, everything else deleted:
Dan,17,11,10
Alex,11,20,30
It sounds so much easier than it actually is and I seriously need help with this sorting algorithm.
CodePudding user response:
You can try to collect your rows in a dictionary by given key (e.g. name in your case) and then write its contents back into a file.
import csv
unique_rows = {}
with open("data.csv", "r", newline="") as in_file:
for row in csv.reader(in_file):
unique_rows[row[0]] = row # where 0 is the index of your key column
with open("data.csv", "w", newline="") as out_file:
writer = csv.writer(out_file)
writer.writerows(unique_rows.values())
On each duplicate, the latter row will overwrite the previous one stored in the dictionary. Or just be stored, if no given key is present in the dict.
CodePudding user response:
I think as @mozway suggested, you're looking for the groupby.last
method. Here it is applied to your example:
import pandas as pd
df = pd.DataFrame(
[
['Dan', 20, 30, 15],
['Dan', 15, 20, 20],
['Dan', 17, 11, 10],
['Alex', 10, 10, 10],
['Alex', 11, 20, 30],
],
columns=['Name', 'A', 'B', 'C']
)
print(df.groupby('Name').last())
A B C
Name
Alex 11 20 30
Dan 17 11 10
CodePudding user response:
you need to read it all the data and save the data to list of dict, then you need to reconstruct the list to the key value dict with name as the key so every you got the same name the value will be reassign
example
name,v1,v2,v3,
Dan,20,30,50,
Dan,24,2,75,
Dan,25,78,23,
Alex,12,22,98,
Alex,33,12,32,
code:
import csv
data = []
with open('csvFile.csv') as csv_file:
data = [{k: v for k, v in row.items()}
for row in csv.DictReader(csv_file, skipinitialspace=True)]
new_data = {}
for item in data:
new_data[item['name']] = [item[value] for value in item]
final_data = [new_data[item] for item in new_data]
print(final_data)
#output
[['Dan', '25', '78', '23', ''], ['Alex', '33', '12', '32', '']]
CodePudding user response:
I assume that you have the content of the CSV already in an Array.
There is no need for external libraries, just use enumerate()
to get the index. You can read the list by using index in a relative way.
This should do what you want:
def get_uniques():
csv_reading = ['Dan,20,30,15', 'Dan,15,20,20', 'Dan,17,11,10', 'Alex,10,10,10', 'Alex,11,20,30']
final_result = []
for index, row in enumerate(csv_reading):
name = row.split(',')[0] # get the actual name e.g. 'Dan'
if index < len(csv_reading) - 1: # needed to avoid index errors
next_iteration = csv_reading[index 1] # get the next row
if name not in next_iteration: # check if the name is in the next row
final_result.append(row)
else:
final_result.append(row) # always append the last row
return final_result
CodePudding user response:
The below code will outputs a list of lists. Each list has the first element as the key, with the remaining elements as the values. It keeps the latest:
[["Dan", ["17", "11", "10"]], ["Alex", ["11", "20", "30"]]]
import json
# Open the file in read mode
file = open("file", "r")
# Convert string into list
lst = file.read().split()
dic = {}
# Populate the first element of each line as the key
# The remaining elements are the values for the key
for line in lst:
line = line.split(",")
key, value = line[0], line[1:]
dic[key] = value
# Convert dict into list
zip = list(zip(dic.keys(), dic.values()))
# Convert dictionary object into nested list
result = json.dumps(zip)
print(result)
CodePudding user response:
just covert it to dict and the coulm you need to not duplicate use this command
list(dict.fromkeys(objectid))