Home > Net >  find duplicate files in a .txt file using csv to fill
find duplicate files in a .txt file using csv to fill

Time:04-27

i have tried this method but it will only find duplicates if entire row is same i want duplicates from specifically column[1] these values are being imported from a .txt file as csv items entry and rating heading not in .txt file only column1 and column 2, the numbers are not in the file just the csv's

  1. 996,0

  2. 996,1.67

  3. 123,0

  4. 123,8.13

  5. 456,0

  6. 456,0.00001

     seen_rows =[]
     duplicate_rows =[]
     for row in csv.reader(in_file):
         if row in seen_rows:
             duplicate_rows.append(row)
             print("Duplicate entry found in entry.txt file please correct this issue then run the program again. duplicates are as follows:", duplicate_rows)
         else:
             seen_rows.append(row)
             print(seen_rows)
    

if someone could point me in the right direction that would be great and please feel free to explain how the solution works i am still very new to python and i am stuck on this one thanks in advance.

CodePudding user response:

i found a way that works as seen below for those who would like to know

    with open("filename.txt or .csv", "r") as in_file:    # opens the file you want to check
    seen_rows =[]                                         # empty list to store seen rows in
    duplicate_rows =[]                                    # empty list to store duplicate files in
    for row in in_file:                                   # for every row in the file do the below
        columns = row.strip().split(",")                  # define columns in row using strip reomve spaces and split columns by csv 0,1,2,3,4,5,6 etc...
        if columns[0] in seen_rows:                       # if column 1 value is already in seen rows then add the duplicate value inot the duplicate list
            duplicate_rows.append(columns[0])
        else:
            seen_rows.append(columns[0]) # if value not in seen list add it to seen list
    if not duplicate_rows:
        ##do whatever you want to do as there are no duplicate files,(if not duplicate_rows = false) as list is empty 
        
    else:
        print("Duplicate number found in column 1")       ##because the list is not empty (if not duplicate = True) forcing the code to use the else statement

if anyone has another way of doing this please feel free to let me know of a better way.

P.S this method works without pandas if i have made any incorrect assumptions please correct the code thanks hope this helps someone out there understand how this works

  • Related