I need to convert large csv file to xlsx file, about 5 million rows, I need to set each output xlsx file to 100,000 lines and save it separately
import pandas as pd
data = pd.read_csv("k.csv")
data.to_excel("new_file.xlsx", index=None, header=True)
how should i add the row count parameter?
CodePudding user response:
The following approach will split your k.csv
into chunks of n
rows each. Each chunk is given a number e.g. new_file001.xslx
import pandas as pd
n = 100000 # number of rows per chunk
df = pd.read_csv("k.csv")
for i in range(0, df.shape[0], n):
df[i:i n].to_excel(f"new_file{i:03}.xlsx", index=None, header=True)
CodePudding user response:
Use iloc
to slice the csv as many times as needed.
e.g., to get rows 10000 to 19999, you would use
subset = data.iloc[10000:20000]