I'm trying to merge multiple file into one excel file using openpyxl in python
I know there is a way using panda, but my files have a problem there have been always 2 empty rows in the beginning of the excel file
So to avoid that I'm using openpyxl with the old way Just open all files and copy the specific rows and columns to a new one
The first step I find out how to do it by just copy the specific row's and column of the new xlsx file
but I didn't find a way to add the next file (only the value not the header) under the first one
this my code
So far it just copy the first file (the header and the value) But I didn't find out how to add the next file (only the value) under the first one
import openpyxl as xl
from openpyxl import Workbook
import os
def find_xlsx_files():
# the current path
dir_path = os.path.dirname(os.path.abspath(__file__))
# list to store files
res = []
# Iterate directory
for file in os.listdir(dir_path):
# check only xlsx files
if file.endswith('.xlsx'):
res.append(file)
return (res)
wb1 = xl.load_workbook (find_xlsx_files()[0])
ws1 = wb1.worksheets [0]
# open target Excel file
wb2 = Workbook()
ws = wb2.active
ws.title = "Changed Sheet"
wb2.save(filename = 'sample_book.xlsx')
ws2 = wb2.active
# calculate the total rows and
# columns in the Excel source file
mr = ws1.max_row
mc = ws1.max_column
# copy cell values from source
# Excel file to target Excel file
for i in range ( 3 , mr 1 ):
for j in range ( 2 , mc 1 ):
# read cell value from Excel source file
c = ws1.cell (row = i, column = j)
# writing the read value to the target Excel file
ws2.cell (row = i, column = j) .value = c.value
# save target Excel file
wb2.save ( str ('sample_book.xlsx'))
CodePudding user response:
What you are doing is creating a list of the excel files in the default directory then just opening the first file '[0]' in the list with the line;
wb1 = xl.load_workbook (find_xlsx_files()[0])
This will never attempt to access any other excel file in the list.
Having the list generation in the load book command isn't good, you don't want to be generating the list of available excel files each time you process a file. Calling of the function find_xlsx_files() should be done once.
The easiest fix to your code is to get your list of excel files and then iterate that list for processing.
excel_files = find_xlsx_files()
for xl_file in excel_files:
wb1 = xl.load_workbook(xl_file)
...
Also it should not be necessary to save the book until you have finished writing all data.
The function can be simplified using glob instead if you prefer.
import glob
import os
from openpyxl import Workbook, load_workbook
dir_path = os.path.dirname(os.path.abspath(__file__))
excel_files = glob.glob(dir_path "/[!~]*.xlsx")
for xl_file in excel_files:
wb1 = load_workbook(xl_file)
ws1 = wb1.worksheets[0]
# open target Excel file
wb2 = Workbook()
ws = wb2.active
ws.title = "Changed Sheet"
# wb2.save(filename='sample_book.xlsx')
ws2 = wb2.active
# calculate the total rows and
# columns in the Excel source file
mr = ws1.max_row
mc = ws1.max_column
# copy cell values from source
# Excel file to target Excel file
for i in range(3, mr 1):
for j in range(2, mc 1):
# read cell value from Excel source file
c = ws1.cell(row=i, column=j)
# writing the read value to the target Excel file
ws2.cell(row=i, column=j).value = c.value
# save target Excel file
wb2.save(str('sample_book.xlsx'))
This is also assuming there is only one sheet in each excel file you want to process since you're only opening the first sheet.
ws1 = wb1.worksheets[0]