Home > Software engineering >  Compare sheets across in two Excel files that have many sheets
Compare sheets across in two Excel files that have many sheets

Time:09-15

I am looking to compare two Excel workbooks that should be identical to identify where there are differences between the content.

The below code I found here works great, but I have some workbooks that have many varying numbers of sheets (some will have one sheet, others will have 70 sheets across the two workbooks). Is there a way to iterate through all of the dataframes/sheets in the workbook (e.g. a range of indices) without having to hard code the index numbers? Thanks!

In block 1
sheet1 = rb1.sheet_by_index(0)

Then in block 2
sheet1 = rb1.sheet_by_index(1)

Then in block 3
sheet1 = rb1.sheet_by_index(2)
from itertools import izip_longest
import xlrd

rb1 = xlrd.open_workbook('file1.xlsx')
rb2 = xlrd.open_workbook('file2.xlsx')

sheet1 = rb1.sheet_by_index(0)
sheet2 = rb2.sheet_by_index(0)

for rownum in range(max(sheet1.nrows, sheet2.nrows)):
    if rownum < sheet1.nrows:
        row_rb1 = sheet1.row_values(rownum)
        row_rb2 = sheet2.row_values(rownum)

        for colnum, (c1, c2) in enumerate(izip_longest(row_rb1, row_rb2)):
            if c1 != c2:
                print "Row {} Col {} - {} != {}".format(rownum 1, colnum 1, c1, c2)
    else:
        print "Row {} missing".format(rownum 1)```

CodePudding user response:

You can make use of "openpyxl" library to get the sheetnames for an excel.

from openpyxl import load_workbook 
excel = load_workbook(filepath, read_only=True,keep_links=False)
sheet_names = excel.sheetnames
print(sheet_names) # prints the sheetnames of the excel in form of a list

You can iterate over the "sheet_names" variable

for sheet in sheet_names:
   df = pd.read_excel(filepath,sheet,engine='openpyxl')
  • Related