add specific rows and column into another excel file using openpyxl in python-CodePudding

I'm trying to merge multiple file into one excel file using openpyxl in python

I know there is a way using panda, but my files have a problem there have been always 2 empty rows in the beginning of the excel file

So to avoid that I'm using openpyxl with the old way Just open all files and copy the specific rows and columns to a new one

The first step I find out how to do it by just copy the specific row's and column of the new xlsx file

but I didn't find a way to add the next file (only the value not the header) under the first one

this my code

So far it just copy the first file (the header and the value) But I didn't find out how to add the next file (only the value) under the first one

import openpyxl as xl
from openpyxl import Workbook
import os

def find_xlsx_files():
    # the current path
    dir_path = os.path.dirname(os.path.abspath(__file__))

    # list to store files
    res = []
    # Iterate directory
    for file in os.listdir(dir_path):
        # check only xlsx files
        if file.endswith('.xlsx'):
            res.append(file)

    return (res)


wb1 = xl.load_workbook (find_xlsx_files()[0])

ws1 = wb1.worksheets [0] 

# open target Excel file


wb2 = Workbook()
ws =  wb2.active

ws.title = "Changed Sheet"
wb2.save(filename = 'sample_book.xlsx')
ws2 = wb2.active

# calculate the total rows and
# columns in the Excel source file

mr = ws1.max_row

mc = ws1.max_column 

# copy cell values from source
# Excel file to target Excel file

for i in range ( 3 , mr   1 ):
    for j in range ( 2 , mc   1 ):
    # read cell value from Excel source file
        c = ws1.cell (row = i, column = j)
    # writing the read value to the target Excel file
        ws2.cell (row = i, column = j) .value = c.value


# save target Excel file

wb2.save ( str ('sample_book.xlsx'))

CodePudding user response：

What you are doing is creating a list of the excel files in the default directory then just opening the first file '[0]' in the list with the line;

wb1 = xl.load_workbook (find_xlsx_files()[0])

This will never attempt to access any other excel file in the list.
Having the list generation in the load book command isn't good, you don't want to be generating the list of available excel files each time you process a file. Calling of the function find_xlsx_files() should be done once.
The easiest fix to your code is to get your list of excel files and then iterate that list for processing.

excel_files = find_xlsx_files()
for xl_file in excel_files:
    wb1 = xl.load_workbook(xl_file)
...

Also it should not be necessary to save the book until you have finished writing all data.
The function can be simplified using glob instead if you prefer.

import glob
import os
from openpyxl import Workbook, load_workbook

dir_path = os.path.dirname(os.path.abspath(__file__))
excel_files = glob.glob(dir_path   "/[!~]*.xlsx")

for xl_file in excel_files:
    wb1 = load_workbook(xl_file)
    ws1 = wb1.worksheets[0]

    # open target Excel file
    wb2 = Workbook()
    ws = wb2.active

    ws.title = "Changed Sheet"
    # wb2.save(filename='sample_book.xlsx')
    ws2 = wb2.active

    # calculate the total rows and
    # columns in the Excel source file
    mr = ws1.max_row
    mc = ws1.max_column

    # copy cell values from source
    # Excel file to target Excel file
    for i in range(3, mr 1):
        for j in range(2, mc 1):
        # read cell value from Excel source file
            c = ws1.cell(row=i, column=j)
        # writing the read value to the target Excel file
            ws2.cell(row=i, column=j).value = c.value

# save target Excel file
wb2.save(str('sample_book.xlsx'))

This is also assuming there is only one sheet in each excel file you want to process since you're only opening the first sheet.

ws1 = wb1.worksheets[0]