i have tried this method but it will only find duplicates if entire row is same i want duplicates from specifically column[1] these values are being imported from a .txt file as csv items entry and rating heading not in .txt file only column1 and column 2, the numbers are not in the file just the csv's
996,0
996,1.67
123,0
123,8.13
456,0
456,0.00001
seen_rows =[] duplicate_rows =[] for row in csv.reader(in_file): if row in seen_rows: duplicate_rows.append(row) print("Duplicate entry found in entry.txt file please correct this issue then run the program again. duplicates are as follows:", duplicate_rows) else: seen_rows.append(row) print(seen_rows)
if someone could point me in the right direction that would be great and please feel free to explain how the solution works i am still very new to python and i am stuck on this one thanks in advance.
CodePudding user response:
i found a way that works as seen below for those who would like to know
with open("filename.txt or .csv", "r") as in_file: # opens the file you want to check
seen_rows =[] # empty list to store seen rows in
duplicate_rows =[] # empty list to store duplicate files in
for row in in_file: # for every row in the file do the below
columns = row.strip().split(",") # define columns in row using strip reomve spaces and split columns by csv 0,1,2,3,4,5,6 etc...
if columns[0] in seen_rows: # if column 1 value is already in seen rows then add the duplicate value inot the duplicate list
duplicate_rows.append(columns[0])
else:
seen_rows.append(columns[0]) # if value not in seen list add it to seen list
if not duplicate_rows:
##do whatever you want to do as there are no duplicate files,(if not duplicate_rows = false) as list is empty
else:
print("Duplicate number found in column 1") ##because the list is not empty (if not duplicate = True) forcing the code to use the else statement
if anyone has another way of doing this please feel free to let me know of a better way.
P.S this method works without pandas if i have made any incorrect assumptions please correct the code thanks hope this helps someone out there understand how this works