Language: Python 3.8
Platform: MacOS 11 | Windows 10
Filetypes: .xlsx | .csv.
Task: File/Format Conversion
Synopsis: My excel file has cells with functions/formulas. I want to save this file as a .csv while preserving the value of the formulas (not the actual string of the function, itself)
What works: Pause script, prompt user to open Excel > 'Save As' .csv // Excel processes the functions within the cells and preserves the values before saving as .csv
What hasn't worked: Using pandas or openpyxl to convert the excel file to a .csv (such as 'wb.save' and 'df.to_csv' // The produced .csv file does not process the function cells and instead outputs nothing within those cells.
Question: Anyway of leveraging excel's 'process the function and save the values' function within the Python script?
Thank you!
Sample Code - Pandas
df = pd.read_excel('file.xlsx')
df.to_csv('file.csv')
Sample Code - Openpyxl
wb = openpyxl.load_workbook('file.xlsx', data_only=True)
sheet = wb.active
with open('file.csv', 'w', newline="") as f:
c = csv.writer(f)
for r in sheet.iter_rows():
c.writerow([cell.value for cell in r])
wb.save('file.csv')
Sample Problem
Excel Columns:
A: ['First Initial']
B: ['Last Name']
C: ['Email']
Formula in all rows within column C:
C1: [=CONCATENATE(A1,".",B1,"@domain.net")]
C2: [=CONCATENATE(A2,".",B2,"@domain.net")]
C3: [=CONCATENATE(A3,".",B3,"@domain.net")]
etc.
Output of 'file.xlsx' through excel & 'file.csv' (via excel > 'Save As' .csv):
A1: ['j']
B1: ['doe']
C1: ['[email protected]']
Output of 'file.csv' after following the Pandas Sample Code:
A1: ['j']
B1: ['doe']
C1: ['']
if a cell does not contain a formula, the conversion outputs correct values within the cells. For the cells with formulas, the cells are empty (since .csv is just plain-text). Is there a way to replicate excel's behavior of running the functions first > save output value into cell > save as .csv?
UPDATE:
So I found the issue, although not sure how to go about solving this. Pandas works as intended when I created a fresh .xlsx and tried the sample code. But it didn't work with the .xlsx in my script - and I narrowed it down to this step
The following is a snippet from my script that copies values from one excel file into another:
wb1 = xl.load_workbook('/file1.xlsx')
ws1 = wb1.worksheets[0]
wb2 = xl.load_workbook('/file2.xlsx')
ws2 = wb2.active
mr = ws1.max_row
mc = ws1.max_column
for i in range (1, mr 1):
for j in range (1, mc 1):
c = ws1.cell(row = i, column = j)
ws2.cell(row = i, column = j).value = c.value
wb2.save('file2.xlsx')
The file ('file2.xlsx'), while seemingly opens and functions just like a regular excel file, DOES NOT preserve its values within cells that have formulas after converting it to a .csv via pandas.
The file ('file1.xlsx') however, does this just fine.
BUT, if I open 'file2.xlsx' and just simply save it (without changing anything), and then try converting it via pandas - it DOES end up preserving the values within formulas.
So there's definitely something wrong in my code (surprise, surprise) that does this. Not sure why, though.
CodePudding user response:
This can be done using Pandas library. Here ,this might help : https://www.geeksforgeeks.org/convert-excel-to-csv-in-python/
CodePudding user response:
SOLVED
I was able to solve my own question - posting it here for anyone else who has a similar issue (searching this problem led me believe ya'll exist, so here you go.)
Note: This only works on a Windows system, with Excel installed
import win32com.client as win32
from win32com.client import Dispatch
from win32com.client import constants as c
excel = Dispatch('Excel.Application') # Calls Excel
excel.DisplayAlerts = False # Disables prompts, such as asking to overwrite files
wb = excel.Workbooks.Open("/file.xlsx") # Input File
wb.SaveAs("/file.csv"), c.xlCSV) # Output File
excel.Application.Quit() # Close Excel
excel.DisplayAlerts = True # Turn alerts back on