Python code to break .xlsx file having 52000 rows into 11 excel files (10 xlsx files with 5000 rows-CodePudding

I am trying to write a Python code to break .xlsx file having 52000 rows into 11 excel files (10 xlsx files with 5000 rows and 1 file with remaining 2000 records)

having a hard time finding a good online solution

Tried the below solution which is producing exact desired outcome but it is also producing blank 49,996 xlsx files

import pandas as pd df = pd.read_excel("C:\Users\rajat.kapoor\Desktop\All Data Combined\Credit Check Data.xlsx") n_partitions = 5000

for i in range(n_partitions): sub_df = df.iloc[(i*n_partitions): ((i 1)*n_partitions)] sub_df.to_excel(f"C:\Users\rajat.kapoor\Desktop\All Data Combined\Credit Check Data - {i}.xlsx", sheet_name="a")

I am trying to write a Python code to break .xlsx file having 52000 rows into 11 excel files (10 xlsx files with 5000 rows and 1 file with remaining 2000 records)

having a hard time finding a good online solution

Tried the below solution which is producing exact desired outcome but it is also producing blank 49,996 xlsx files

import pandas as pd df = pd.read_excel("C:\Users\rajat.kapoor\Desktop\All Data Combined\Credit Check Data.xlsx") n_partitions = 5000

for i in range(n_partitions): sub_df = df.iloc[(i*n_partitions): ((i 1)*n_partitions)] sub_df.to_excel(f"C:\Users\rajat.kapoor\Desktop\All Data Combined\Credit Check Data - {i}.xlsx", sheet_name="a")

CodePudding user response：

Instead of pandas, you could also use openpyxl like below, to split your data into different files:

import openpyxl

# Open the .xlsx file using openpyxl
wb = openpyxl.load_workbook('data.xlsx')

# Get the sheet name
sheet_name = wb.sheetnames[0]

# Get the sheet
sheet = wb[sheet_name]

# Set the number of rows per file
rows_per_file = 5000

# Set the starting row and ending row for the first file
start_row = 1
end_row = rows_per_file

# Set the starting file number
file_number = 1

# Iterate over the rows in the sheet
while start_row < sheet.max_row:
    # Create a new workbook for the current file
    file_wb = openpyxl.Workbook()

    # Get the active sheet in the new workbook
    file_sheet = file_wb.active

    # Iterate over the rows in the current file
    for row in sheet[start_row:end_row]:
        # Iterate over the cells in the row
        for cell in row:
            # Write the cell value to the corresponding cell in the new sheet
            file_sheet[cell.coordinate].value = cell.value

    # Save the new workbook with the current file number
    file_wb.save(f'data_{file_number}.xlsx')

    # Increment the file number
    file_number  = 1

    # Set the starting and ending rows for the next file
    start_row = end_row   1
    end_row  = rows_per_file

# Save the remaining rows in the last file
file_wb = openpyxl.Workbook()
file_sheet = file_wb.active
for row in sheet[start_row:sheet.max_row]:
    for cell in row:
        file_sheet[cell.coordinate].value = cell.value
file_wb.save(f'data_{file_number}.xlsx')

This code first loads the .xlsx file using openpyxl, and then gets the sheet name and sheet. It then sets the number of rows per file (in this case, 5000) and the starting and ending rows for the first file. It then enters a loop that iterates over the rows in the sheet, creating a new workbook for each file and writing the cell values to the corresponding cells in the new sheet. Finally, it saves the new workbook with the current file number. The loop continues until all of the rows in the sheet have been processed.

CodePudding user response：

you can change the increment step from 1 to 5000. So you would only create 11 excel files. Since rest similar I am showing the different part below.

for i in range(0,52000,5000):

   sub_df = df.iloc[(i): ((i 5000))]