Memory issues Converting large CSV file to excel using Python-CodePudding

I am trying to convert a large ~ 100mb csv file to xlsx using python. I tried the solution below, but ran into memory issues

https://stackoverflow.com/a/62750402/1266723

Solution below solves the memory issue by writing in chunks, but over-writes the files. Can someone please recommend how to append to the same file instead of creating new files or overwriting the same file?

https://stackoverflow.com/a/71237366/1266723

import pandas as pd

n = 1000  # number of rows per chunk
df = pd.read_csv("myFile.csv")

for i in range(0, df.shape[0], n):
    df[i:i n].to_excel(f"new_file.xlsx", index=False, header=False)

CodePudding user response：

Have you tried this (see second last example at the botton of the page)?

# ExcelWriter can also be used to append to an existing Excel file:

with pd.ExcelWriter('output.xlsx',
                    mode='a') as writer:  
    df.to_excel(writer, sheet_name='Sheet_name_3')

CodePudding user response：

Keeping your code, you can specify the startrow attribute of pd.DataFrame.to_excel:

for i in range(0, df.shape[0], n):
    if i == 0:
        # First chunk
        startrow = 0
    else:
        startrow = i   n
    df[i:i n].to_excel(f"new_file.xlsx", index=False, header=False, startrow=startrow)

CodePudding user response：

You can use an ExcelWriter to do multiple writes to an xlsx file.

You could even read the csv file in chunks:

n = 1000  # chunksize
with pd.read_csv("myFile.csv", chunksize=n) as reader, \
     pd.ExcelWriter('new_file.xlsx') as writer:
    for i, df in enumerate(reader):
        df.to_excel(writer, index=False, header=False, startrow=i * n)