Home > Software engineering >  Check Dataframes with Python
Check Dataframes with Python

Time:11-20

I got multiple excel file which needs to be need if OLD data is matching NEW data. Normally I use dataframe.equals but since the NEW data is containing additional columns this doesn't work anymore.

Very excel file contains two tabs with OLD and NEW data. I have to check if the OLD data is matching per record in NEW. The issue of NEW is that columns aren't in the same order, additional columns plus record aren't in the same order.

Table OLD and Table NEW

The code I normally use to check but it is giving

import os
import pandas as pd

TargetFolder = os.listdir('Dir')

for file in TargetFolder:

    df = pd.ExcelFile('TargetFolder'   file)

    dfPrep = pd.read_excel(df, 'OLD')
    dfCE = pd.read_excel(df, 'NEW')

    Checkdf = dfPrep.equals(dfCE)

CodePudding user response:

IIUC, you can use pandas.DataFrame.loc to select/pick the exact OLD columns from the NEW ones then use pandas.DataFrame.sort_values to reorder the rows by the two columns Column4 and Column8.

Try this :

from pathlib import Path
import pandas as pd

a_directory= "path_to_the_folder_containing_the_excel_files"

for file in Path(a_directory).glob("*.xlsx"):
    dfPrep = pd.read_excel(file, sheet_name="OLD")
    dfCE = pd.read_excel(file, sheet_name="NEW")

    dfCE_adapted= (
                    dfCE.loc[:, dfPrep.columns]
                        .sort_values(by=["Column4", "Column8"], ignore_index=True)
                   )

    Checkdf= dfPrep.equals(dfCE_adapted)
    
    if Checkdf:
        print(file.stem)
    else:
        pass

If the two sheets OLD and NEW matches, then the Excel filename will be printed.

  • Related