I'm trying to create a script which would run through excel files in a folder and copy the contents to a workbook. The aim is to copy the contents of each file onto different columns where the spacing between the columns is a set difference, ie. columns: A, D(A 3) & G(D 3). For my example I am running my code with 3 base datasets.
When I run the code, the final dataset ends up copying across the final excel document 3 times across the specified columns, instead of copying the 3 unique documents to the specified columns.
What I want: A B C
What I get: C C C
Code:
import os
import openpyxl
from openpyxl import Workbook, load_workbook
import string
for file in os.listdir(file_path):
if file.endswith('.xlsx'):
print(f'Loading file {file}...')
wb = load_workbook(file_path file)
ws = wb.worksheets[0]
wb1 = load_workbook(new_path 'data.xlsx')
ws1 = wb1.active
#calculate max rows and columns in source dataset
mr = ws.max_row
mc = ws.max_column
m = [0,3,6]
#copying data to new sheet
for i in range(1,mr 1):
for j in range(1,mc 1):
for y in range(0,3):
#reading cell value from source
c = ws.cell(row = i, column = j)
#writing read value to destination
ws1.cell(row = i, column = j int(m[y])).value = c.value
wb1.save(new_path 'data.xlsx')
Thank you for your help.
Edit:
The data is all in the same format and looks like:https://ibb.co/TMStH9j Current output: https://ibb.co/dmcbSJ1 Desired output: https://ibb.co/C1nqKJv
CodePudding user response:
You need to move the creation and saving of the new workbook out of the for loop so that it is not overwritten each time a new file is looped over.
Also you need a way to count how many files you have looped over, so that you can increment the columns where the new data is copied to in the new workbook. Please see below:
Edit:
To get your expected output, I also removed the inner-most for loop and m list to rather use a single variable to space the columns of each new excel data apart.
import os
import openpyxl
from openpyxl import Workbook, load_workbook
import string
# Create new workbook outside of for loop so that it is not overwritten each loop
wb1 = Workbook()
ws1 = wb1.active
# count variable so each loop increments the column where the data is posted
count = 0
# how many columns to space data apart
col_spacing = 2
for file in os.listdir(file_path):
if file.endswith(".xlsx"):
print(f"Loading file {file}...")
wb = load_workbook(file_path file)
ws = wb.worksheets[0]
# calculate max rows and columns in source dataset
mr = ws.max_row
mc = ws.max_column
# copying data to new sheet
for i in range(1, mr 1):
for j in range(1, mc 1):
# reading cell value from source
c = ws.cell(row=i, column=j)
# writing read value to destination
ws1.cell(row=i, column=count j (count * col_spacing)).value = c.value
# increment column count
count = 1
# save new workbook after all files have been looped through
wb1.save(new_path "data.xlsx")