I'm doing python OCR image to text, and compare if there is duplicate, I'm checking one by one so that I can locate easier
ex: listA = [1, 2 ,3 , 4, 4, 5, 6]
so when I append list A, can show 4 is duplicate
Mian issue: my list "listOfElems" is empty
and want to save text and detect is duplicate in list one by one
from PIL import Image
import pytesseract
import cv2
import numpy as np
from os import listdir
from os.path import isfile, join
mypath = "/home/DC_ton/desktop/test_11_8/output02"
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
print(onlyfiles)
i = 1
listOfElems = []
Number_of_onlyfiles = len(onlyfiles)
while i < Number_of_onlyfiles :
each_file_path = '/home/DC_ton/desktop/test_11_8/output02/' onlyfiles[i]
image = Image.open(each_file_path)
text = pytesseract.image_to_string(image, lang='eng')
print(text)
for text in listOfElems:
if text not in listOfElems:
listOfElems.append(text)
else:
print("here get duplicate")
i =1
print(listOfElems)
newlist = []
duplist = []
def checkIfDuplicates_1(listOfElems):
''' Check if given list contains any duplicates '''
if len(listOfElems) == len(set(listOfElems)):
return False
else:
return True
result = checkIfDuplicates_1(listOfElems)
if result:
print('Yes, list contains duplicates')
else:
print('No duplicates found in list')
for k in listOfElems:
if k not in newlist:
newlist.append(k)
else:
duplist.append(k)
print("List of duplicates", duplist)
- output:
my list "listOfElems" is empty
and I want to compare one by one
['final_output_11.png', 'final_output_6.png', 'final_output_17.png', 'final_output_8.png', 'final_output_15.png', 'final_output_14.png', 'final_output_2.png', 'final_output_12.png', 'final_output_21.png', 'final_output_3.png', 'final_output_24.png', 'final_output_18.png', 'final_output_19.png', 'final_output_10.png', 'final_output_29.png', 'final_output_9.png', 'final_output_20.png', 'final_output_7.png', 'final_output_31.png', 'final_output_30.png', 'final_output_25.png', 'final_output_1.png', 'final_output_16.png', 'final_output_5.png', 'final_output_27.png', 'final_output_13.png', 'final_output_28.png', 'final_output_4.png', 'final_output_23.png', 'final_output_26.png', 'final_output_22.png']
CA7T4B2
CAT7T4BF
CAT4B8
CAT4BE
CAT4C4
CAT4C1
CAT4B7
CA7T4CB
CAT4cs
CAT4B4
CAT4BA
CAT7T4BC
CA74B9
CAT4BD
(CAT4AF
CAT4CA
[]
No duplicates found in list
List of duplicates []
image link: that I can check "entire set" if duplicate, just don't know for one by one
https://imgur.com/a/RGUumoy
and I searched the discution said the similar case, but I failed for fitting to my case, therefore, I still need a hand How to get Array one by one Randomly in array order in Python
CodePudding user response:
You are creating an empty list, never add anything to it and then iterate over it (nothing)
i = 1
listOfElems = [] # <- empty
Number_of_onlyfiles = len(onlyfiles)
while i < Number_of_onlyfiles :
each_file_path = '/home/DC_ton/desktop/test_11_8/output02/' onlyfiles[i]
image = Image.open(each_file_path)
text = pytesseract.image_to_string(image, lang='eng')
print(text)
for text in listOfElems: # <- still empty
if text not in listOfElems:
listOfElems.append(text)
else:
print("here get duplicate")
i =1
Easy solution would be to add the current element to the list if it isn't in there already. Like so:
while i < Number_of_onlyfiles :
each_file_path = '/home/DC_ton/desktop/test_11_8/output02/' onlyfiles[i]
image = Image.open(each_file_path)
text = pytesseract.image_to_string(image, lang='eng')
print(text)
if text not in listOfElems:
listOfElems.append(text)
else:
print("Duplicate")
Also note that indexes start at 0, so i should be 0 in the beginning and you don't have to iterate over lists to check if an element is in it, just use the "in" operator.
You could also save a couple of lines by iterating over onlyfiles:
for file in onlyfiles:
file_path = mypath file