I have some code I would like to improve. Firstly because it's pretty slow and secondly because append is going to be deprecated. I have this code, and I would like to use concat instead of append for the reasons I mentioned, but after checking several similar questions on stack overflow I haven't figured a way around it for my own code. I am sure it has a simple solution, but I just can't find it. I would appreciate any help a lot. Thanks in advance!
import time
from time import sleep
# IMPORTAR LIBRERÍA EXCEL Y MÓDULO SISTEMA
import os
import csv
import pandas as pd
import pandas
import openpyxl
import warnings
with warnings.catch_warnings(record=True):
warnings.simplefilter("always")
# LIBRERÍA ITERACIÓN CARPETAS
from pathlib import Path
# DE CADA ARCHIVO EXCEL EXISTENTE EN EL DIRECTORIO, BORRA LAS COLUMNAS 1-15
INPUT_DIR = Path.cwd() / r"C:\Users\param\OneDrive\Documents\Automat Consumos\Excels Descargas"
for file in list(INPUT_DIR.rglob("*.xls*")):
df = pd.read_excel(file)
if len(df. index) >12:
df = df.drop([0,1,2,3,4,5,6,7,8,9,10,11,12], axis = 0)
df.to_excel(file, engine="openpyxl", header = False, index = False)
else:
os.remove(file)
df = pd.DataFrame()
for file in list(INPUT_DIR.rglob("*.xls*")):
df = df.append(pd.read_excel(file), ignore_index=True)
df.to_excel(r"C:\Users\param\OneDrive\Documents\Automat Consumos\Excels Combinados\Final Sin Etiquetas\EXCEL DEFINITIVO TOTAL.xlsx", engine="openpyxl", index = False)
CodePudding user response:
Given your question refers to a specific part of the code, replacing the append()
with concat()
. I see you are outputting an excel which is getting overwritten after every iteration this is (probably) a mistake and very inefficient as well. This part of the code:
df = pd.DataFrame()
for file in list(INPUT_DIR.rglob("*.xls*")):
df = df.append(pd.read_excel(file), ignore_index=True)
df.to_excel(r"C:\Users\param\OneDrive\Documents\Automat Consumos\Excels Combinados\Final Sin Etiquetas\EXCEL DEFINITIVO TOTAL.xlsx", engine="openpyxl", index = False)
Can be replaced with:
output = pd.concat([pd.read_excel(x,ignore_index=True) for x in list(INPUT_DIR.rglob("*.xls*")])
output.to_excel("path",engine="openpyxl",index=False)