How to copy multiple .xlsx files into a respective .csv file?-CodePudding

I have 24 excel files, I'm aiming to copy the .xslx data and to their respective 24 .csv files. I have copied the data over however its creating 10 copies in the .csv files, I believe it has something to do with the for loops. Ive tried to use writerow() rather than writerows() yet that does help. I'm trying to understand openpyxl and its writer and reader objects.

import openpyxl, os, csv
from pathlib import Path

for excelFile in os.listdir('./excelspreadsheets'):
    if excelFile.endswith('.xlsx'): # Skip non xlsx files, load the workbook object
        wb = openpyxl.load_workbook('./excelspreadsheets/'   excelFile)
        for sheetName in wb.sheetnames:

            # Loop through every sheet in the workbook
            sheet = wb[sheetName]
            sheetTitle = sheet.title

            # Create the CSV filename from the Excel filename and sheet title
            p = Path(excelFile)
            excelFileStemName = p.stem
            CsvFilename = excelFileStemName   '_'   sheetTitle   '.csv'

            # Create the csv.writer object for this CSV file
            print(f'Creating filename {CsvFilename}...')
            outputFile = open(CsvFilename, 'w', newline='')
            outputWriter = csv.writer(outputFile)

            # Create reader object for each excel sheet
            fileObj = open('./excelspreadsheets/'   excelFile)
            fileReaderObj = csv.reader(fileObj)

            # Loop through every row in the excel sheet
            for rowNum in range(1, sheet.max_row   1):
                rowData = [] # append each cell to this list
                
                # Loop through each cell in the row
                for colNum in range(1, sheet.max_column   1):
                    rowData.append(sheet.values)

            # write the rowData list to the CSV file. 
            for row in rowData:
                outputWriter.writerows(row)

            outputFile.close()

So, each of the newly created .csv files writes the correct data but does it 10 times, rather than once.

Appreciate any feedback thanks.

CodePudding user response：

You can use read_excel and to_csv, which come as part of pandas to read excel file and write the data to csv file. It is just simpler from coding perspective, as the read and write will be done in one line. It also uses Openpyxl underneath. The updated code is below.

import openpyxl, os, csv
from pathlib import Path
import pandas as pd
for excelFile in os.listdir('./excelspreadsheets'):
    if excelFile.endswith('.xlsx'): # Skip non xlsx files, load the workbook object
        xls = pd.ExcelFile('./excelspreadsheets/'   excelFile)
        for sheetname in xls.sheet_names:
            #Read each sheet into df
            df = pd.read_excel('./excelspreadsheets/'   excelFile, sheetname)
            #Remove .xlsx from filename and create CSV name
            CsvFilename = excelFile.rstrip('.xlsx')   '_'   sheetname   '.csv'
            print(f'Creating filename {CsvFilename}...')
            #Write df as CSV to file
            df.to_csv(CsvFilename, index=False)

Let me know if you see any errors...