Copy half of the rows of the .csv file and save in .xlsx (in Python, Pandas or other module)-CodePudding

I need to convert .csv file to .xlsx file in Python, size of the file is 446mb, but if I am using next code:

import pandas as pd

read_file = pd.read_csv(r'D:\Назар\Бізнес\Реалізація\External_trade_data_01.2021-12.2021_UK.csv') read_file.to_excel(r'D:\Назар\Бізнес\Реалізація\test.xlsx', index=None, header=True)

The result is: ValueError: This sheet is too large! Your sheet size is: 2 006 894, 28 Max sheet size is: 1 048 576, 16384

How I can copy a header and the first 1 003 447 lines of the .csv file and save in .xlsx file (or csv, then I will convert the whole file to .xlsx), than copy next 1 003 447 lines of the .csv file and save to another .xlsx file.

Thank you!

CodePudding user response：

You can split the csv dataframe into multiple dataframes of smaller size (chunks of 500,000 rows for example) using pandas. Then, you can use to_excel to convert them.

chunksize = 500000
source_path = r'D:\Назар\Бізнес\Реалізація\External_trade_data_01.2021-12.2021_UK.csv'
for i,chunk in enumerate(pd.read_csv(source_path, chunksize=500000, dtype=dtypes)):
    chunk.to_excel('split_csv/chunk{}.xls'.format(i), index=None, header=True)

You can also check to_excel() options if you can export them to one file with multiple sheets instead of having multiple files.

CodePudding user response：

size_file = read_file.shape[0]
read_file1 = read_file.iloc[:int(size_file/2), :]
read_file2 = read_file.iloc[int(size_file/2):, :]
read_file1.to_excel(r'D:\Назар\Бізнес\Реалізація\test1.xlsx', index=None, header=True)
read_file2.to_excel(r'D:\Назар\Бізнес\Реалізація\test2.xlsx', index=None, header=True)