I have 24 excel files, I'm aiming to copy the .xslx data and to their respective 24 .csv files. I have copied the data over however its creating 10 copies in the .csv files, I believe it has something to do with the for
loops. Ive tried to use writerow()
rather than writerows()
yet that does help. I'm trying to understand openpyxl and its writer and reader objects.
import openpyxl, os, csv
from pathlib import Path
for excelFile in os.listdir('./excelspreadsheets'):
if excelFile.endswith('.xlsx'): # Skip non xlsx files, load the workbook object
wb = openpyxl.load_workbook('./excelspreadsheets/' excelFile)
for sheetName in wb.sheetnames:
# Loop through every sheet in the workbook
sheet = wb[sheetName]
sheetTitle = sheet.title
# Create the CSV filename from the Excel filename and sheet title
p = Path(excelFile)
excelFileStemName = p.stem
CsvFilename = excelFileStemName '_' sheetTitle '.csv'
# Create the csv.writer object for this CSV file
print(f'Creating filename {CsvFilename}...')
outputFile = open(CsvFilename, 'w', newline='')
outputWriter = csv.writer(outputFile)
# Create reader object for each excel sheet
fileObj = open('./excelspreadsheets/' excelFile)
fileReaderObj = csv.reader(fileObj)
# Loop through every row in the excel sheet
for rowNum in range(1, sheet.max_row 1):
rowData = [] # append each cell to this list
# Loop through each cell in the row
for colNum in range(1, sheet.max_column 1):
rowData.append(sheet.values)
# write the rowData list to the CSV file.
for row in rowData:
outputWriter.writerows(row)
outputFile.close()
So, each of the newly created .csv files writes the correct data but does it 10 times, rather than once.
Appreciate any feedback thanks.
CodePudding user response:
You can use read_excel
and to_csv
, which come as part of pandas to read excel file and write the data to csv file. It is just simpler from coding perspective, as the read and write will be done in one line. It also uses Openpyxl
underneath. The updated code is below.
import openpyxl, os, csv
from pathlib import Path
import pandas as pd
for excelFile in os.listdir('./excelspreadsheets'):
if excelFile.endswith('.xlsx'): # Skip non xlsx files, load the workbook object
xls = pd.ExcelFile('./excelspreadsheets/' excelFile)
for sheetname in xls.sheet_names:
#Read each sheet into df
df = pd.read_excel('./excelspreadsheets/' excelFile, sheetname)
#Remove .xlsx from filename and create CSV name
CsvFilename = excelFile.rstrip('.xlsx') '_' sheetname '.csv'
print(f'Creating filename {CsvFilename}...')
#Write df as CSV to file
df.to_csv(CsvFilename, index=False)
Let me know if you see any errors...