I have already printed the file which is duplicate from a file directory. what i want is to print both the duplicate file and the corresponding real file from which it was duplicated. below is my code.
path = "Z:/PMT_Training/SoftCo/d_i"
def duplicatecheck():
DATA_DIR = Path(path)
files = sorted(DATA_DIR.glob('*.xml'))
invoice_number = {}
duplicateFiles = []
for i in range(0,len(files)):
tree = ET.parse(files[i])
root = tree.getroot()
record = root.findall('record')
for item in record:
invoice = item.find('invoice_number').text
if invoice in invoice_number:
duplicateFiles.append(files[i])
print("Duplicate file found: ", files[i])
break
else:
invoice_number[invoice] = files[i]
duplicatecheck()
the below is my output:
Duplicate file found: file (1).xml
Duplicate file found: file (2).xml
Duplicate file found: file (3).xml
what i want to print is the duplicate file and the corresponding file from which it was found it was the duplicate
like below:
Duplicate file found: file (1).xml, file (a).xml
Duplicate file found: file (2).xml, file (a).xml
Duplicate file found: file (3).xml, file (a).xml
what i mean is if a file is found as duplicate i want to print both files
CodePudding user response:
if invoice in invoice_number
ensures your dictionary has the item stored, so internally it looks something like this:
{
'my_invoice_number': 'file.xml',
'my_other_invoice_number': 'file2.xml',
}
So all you need to do is print it:
print("Duplicate file found: ", files[i], invoice_number[invoice])