Home > other >  how to read and compare two excel files with multiple worksheets?
how to read and compare two excel files with multiple worksheets?

Time:02-17

I have two excel files and both of them have 10 worksheets. I wanted to read each worksheets, compare them and print data in 3rd excel file, even that would be written in multiple worksheets.

The below program works for single worksheet

import pandas as pd

df1 = pd.read_excel('zyx_5661.xlsx')
df2 = pd.read_excel('zyx_5662.xlsx')

df1.rename(columns= lambda x : x   '_file1', inplace=True)
df2.rename(columns= lambda x : x   '_file2', inplace=True)

df_join = df1.merge(right = df2, left_on = df1.columns.to_list(), right_on = df2.columns.to_list(), how = 'outer')

with pd.ExcelWriter('xl_join_diff.xlsx') as writer:
    df_join.to_excel(writer, sheet_name='testing', index=False)

How can I optimize it to work with multiple worksheets? Please advice.

Thank you.

CodePudding user response:

I think this should achieve what you need. Loop through each sheet name (assuming they're named the same across both excel documents. If not, you can use numbers instead). Write the new output to a new sheet, and save the excel document.

import pandas as pd
    
writer = pd.ExcelWriter('xl_join_diff.xlsx')

for sheet in ['sheet1', 'sheet2', 'sheet3']: #list of sheet names
    #Pull in data for each sheet, and merge together.
    df1 = pd.read_excel('zyx_5661.xlsx', sheet_name=sheet)
    df2 = pd.read_excel('zyx_5662.xlsx', sheet_name=sheet)

    df1.rename(columns= lambda x : x   '_file1', inplace=True)
    df2.rename(columns= lambda x : x   '_file2', inplace=True)

    df_join = df1.merge(right=df2, left_on=df1.columns.to_list(),
                        right_on=df2.columns.to_list(), how='outer')
   
    df_join.to_excel(writer, sheet, index=False) #write to excel as new sheet
    
writer.save() #save excel document once all sheets have been done

CodePudding user response:

You can use the loop to read files and sheets

writer = pd.ExcelWriter('multiple.xlsx', engine='xlsxwriter')
# create writer for writing all sheets in 1 file
list_files=['zyx_5661.xlsx','zyx_5662.xlsx']
count_sheets=0
for file_name in list_files:
     file = pd.ExcelFile(file_name)
     for sheet_name in file.sheet_names:
          df = pd.read_excel(file, sheet_name)
          # ... you can do your process
          count_sheets=count_sheets   1
          df.to_excel(writer, sheet_name='Sheet-' count_sheets)
writer.save()
          
  • Related