I am working a script for reading specific cells from an Excel workbook into a list, and then from the list into a CSV. There's a loop to get workbooks open from a folder as well.
My code:
import csv
import openpyxl
import os
path = r'C:\Users.....' # Folder holding workbooks
workbooks = os.listdir(path)
cell_values = [] # List for storing cell values from worksheets
for workbook in workbooks: # Workbook iteration
wb = openpyxl.load_workbook(os.path.join(path, workbook), data_only=True) # Open workbook
sheet = wb.active # Get sheet
f = open('../record.csv', 'w', newline='') # Open the CSV file
cell_list = ["I9", "AK6", "N35"] # List of cells to check
with f: # CSV writer loop
record_writer = csv.writer(f) # Open CSV writer
for cells in cell_list: # Loop through cell list to get cell values and write them to the cell_values list
cell_values.append(sheet[cells].value) # Append cell values to the cell_values list
record_writer.writerow(cell_values) # Write cell_values list to CSV
quit() # Terminate program after all workbooks in the folder have been analyzed
The output just puts all values on the same line, albeit separated by commas, but it doesn't help me when I go to open my results in Excel if everything is on the same line. When I was using xlrd
, the format was vertical but all I had to do was transpose the dataset to be good. But I had to change from xlrd
(which was a smart move in general) because it would not read merged cells.
I get this:
4083940,140-21-541,NP,8847060,140-21-736,NP
When I want this
4083940,140-21-541,NP
8847060,140-21-736,NP
Edit - I forgot the "what have I tried" portion of my post. I have tried changing my loops around to avoid overwriting the previous write to the CSV. I have tried clearing the list on each loop to get the script to treat each new entry as a new line. I have tried adding \n
in the writer line as I saw in a couple of posts. I have tried to use writerows
instead of writerow
. I tried A
instead of W
even though it is a fix and not a solution but that didn't quite work right either.
CodePudding user response:
Your main problem is that cell_values
is accumulating the cells from multiple sheets. You need to reset it, like, cell_values = []
, for every sheet.
I went back to your original example and:
- moved the opening of record.csv up, and placed all the work inside the scope of that file being open and written into
- moved
cell_values = []
inside your workbook loop - moved
cell_list = ["I9", "AK6", "N35"]
to the top, because that's really scoped for the entire script, if every workbook has the same cells - removed
quit()
, it's not necessary at the very end of the script, and in general should probably be avoided: Python exit commands - why so many and when should each be used?
import csv
import openpyxl
import os
path = r'C:\Users.....' # Folder holding workbooks
workbooks = os.listdir(path)
cell_list = ["I9", "AK6", "N35"] # List of cells to check
with open('record.csv', 'w', newline='') as f:
record_writer = csv.writer(f)
for workbook in workbooks:
wb = openpyxl.load_workbook(os.path.join(path, workbook), data_only=True)
sheet = wb.active
cell_values = [] # reset for every sheet
for cells in cell_list:
cell_values.append(sheet[cells].value)
# Write one row per sheet
record_writer.writerow(cell_values)
Also, I can see your new the CSV module, and struggling a little conceptually (since you tried writerow
, then writerows
, trying to debug your code). Python's official document for CSV doesn't really give practical examples of how to use it. Try reading up here, Writing to a CSV.