I got multiple excel file which needs to be need if OLD data is matching NEW data. Normally I use dataframe.equals but since the NEW data is containing additional columns this doesn't work anymore.
Very excel file contains two tabs with OLD and NEW data. I have to check if the OLD data is matching per record in NEW. The issue of NEW is that columns aren't in the same order, additional columns plus record aren't in the same order.
The code I normally use to check but it is giving
import os
import pandas as pd
TargetFolder = os.listdir('Dir')
for file in TargetFolder:
df = pd.ExcelFile('TargetFolder' file)
dfPrep = pd.read_excel(df, 'OLD')
dfCE = pd.read_excel(df, 'NEW')
Checkdf = dfPrep.equals(dfCE)
CodePudding user response:
IIUC, you can use pandas.DataFrame.loc
to select/pick the exact OLD
columns from the NEW
ones then use pandas.DataFrame.sort_values
to reorder the rows by the two columns Column4
and Column8
.
Try this :
from pathlib import Path
import pandas as pd
a_directory= "path_to_the_folder_containing_the_excel_files"
for file in Path(a_directory).glob("*.xlsx"):
dfPrep = pd.read_excel(file, sheet_name="OLD")
dfCE = pd.read_excel(file, sheet_name="NEW")
dfCE_adapted= (
dfCE.loc[:, dfPrep.columns]
.sort_values(by=["Column4", "Column8"], ignore_index=True)
)
Checkdf= dfPrep.equals(dfCE_adapted)
if Checkdf:
print(file.stem)
else:
pass
If the two sheets OLD
and NEW
matches, then the Excel filename will be printed.