I am trying to convert a large ~ 100mb csv file to xlsx using python. I tried the solution below, but ran into memory issues
https://stackoverflow.com/a/62750402/1266723
Solution below solves the memory issue by writing in chunks, but over-writes the files. Can someone please recommend how to append to the same file instead of creating new files or overwriting the same file?
https://stackoverflow.com/a/71237366/1266723
import pandas as pd
n = 1000 # number of rows per chunk
df = pd.read_csv("myFile.csv")
for i in range(0, df.shape[0], n):
df[i:i n].to_excel(f"new_file.xlsx", index=False, header=False)
CodePudding user response:
Have you tried this (see second last example at the botton of the page)?
# ExcelWriter can also be used to append to an existing Excel file:
with pd.ExcelWriter('output.xlsx',
mode='a') as writer:
df.to_excel(writer, sheet_name='Sheet_name_3')
CodePudding user response:
Keeping your code, you can specify the startrow
attribute of pd.DataFrame.to_excel
:
for i in range(0, df.shape[0], n):
if i == 0:
# First chunk
startrow = 0
else:
startrow = i n
df[i:i n].to_excel(f"new_file.xlsx", index=False, header=False, startrow=startrow)
CodePudding user response:
You can use an ExcelWriter to do multiple writes to an xlsx file.
You could even read the csv file in chunks:
n = 1000 # chunksize
with pd.read_csv("myFile.csv", chunksize=n) as reader, \
pd.ExcelWriter('new_file.xlsx') as writer:
for i, df in enumerate(reader):
df.to_excel(writer, index=False, header=False, startrow=i * n)