I need to convert .csv file to .xlsx file in Python, size of the file is 446mb, but if I am using next code:
import pandas as pd
read_file = pd.read_csv(r'D:\Назар\Бізнес\Реалізація\External_trade_data_01.2021-12.2021_UK.csv') read_file.to_excel(r'D:\Назар\Бізнес\Реалізація\test.xlsx', index=None, header=True)
The result is: ValueError: This sheet is too large! Your sheet size is: 2 006 894, 28 Max sheet size is: 1 048 576, 16384
How I can copy a header and the first 1 003 447 lines of the .csv file and save in .xlsx file (or csv, then I will convert the whole file to .xlsx), than copy next 1 003 447 lines of the .csv file and save to another .xlsx file.
Thank you!
CodePudding user response:
You can split the csv dataframe into multiple dataframes of smaller size (chunks of 500,000 rows for example) using pandas
. Then, you can use to_excel
to convert them.
chunksize = 500000
source_path = r'D:\Назар\Бізнес\Реалізація\External_trade_data_01.2021-12.2021_UK.csv'
for i,chunk in enumerate(pd.read_csv(source_path, chunksize=500000, dtype=dtypes)):
chunk.to_excel('split_csv/chunk{}.xls'.format(i), index=None, header=True)
You can also check to_excel()
options if you can export them to one file with multiple sheets instead of having multiple files.
CodePudding user response:
size_file = read_file.shape[0]
read_file1 = read_file.iloc[:int(size_file/2), :]
read_file2 = read_file.iloc[int(size_file/2):, :]
read_file1.to_excel(r'D:\Назар\Бізнес\Реалізація\test1.xlsx', index=None, header=True)
read_file2.to_excel(r'D:\Назар\Бізнес\Реалізація\test2.xlsx', index=None, header=True)