python csv to xlsx split file excel-CodePudding

I need to convert large csv file to xlsx file, about 5 million rows, I need to set each output xlsx file to 100,000 lines and save it separately

import pandas as pd

data = pd.read_csv("k.csv")

data.to_excel("new_file.xlsx", index=None, header=True)

how should i add the row count parameter？

CodePudding user response：

The following approach will split your k.csv into chunks of n rows each. Each chunk is given a number e.g. new_file001.xslx

import pandas as pd

n = 100000  # number of rows per chunk
df = pd.read_csv("k.csv")

for i in range(0, df.shape[0], n):
    df[i:i n].to_excel(f"new_file{i:03}.xlsx", index=None, header=True)

CodePudding user response：

Use iloc to slice the csv as many times as needed.

e.g., to get rows 10000 to 19999, you would use

subset = data.iloc[10000:20000]