Home > Software engineering >  Use pd.concat instead of df.append
Use pd.concat instead of df.append

Time:08-07

I have some code I would like to improve. Firstly because it's pretty slow and secondly because append is going to be deprecated. I have this code, and I would like to use concat instead of append for the reasons I mentioned, but after checking several similar questions on stack overflow I haven't figured a way around it for my own code. I am sure it has a simple solution, but I just can't find it. I would appreciate any help a lot. Thanks in advance!

import time
from time import sleep
# IMPORTAR LIBRERÍA EXCEL Y MÓDULO SISTEMA
import os
import csv
import pandas as pd
import pandas
import openpyxl
import warnings

with warnings.catch_warnings(record=True):
    warnings.simplefilter("always")
# LIBRERÍA ITERACIÓN CARPETAS
from pathlib import Path

# DE CADA ARCHIVO EXCEL EXISTENTE EN EL DIRECTORIO, BORRA LAS COLUMNAS 1-15   
INPUT_DIR = Path.cwd() / r"C:\Users\param\OneDrive\Documents\Automat Consumos\Excels Descargas"
for file in list(INPUT_DIR.rglob("*.xls*")):
    df = pd.read_excel(file)
    if len(df. index) >12:
        df = df.drop([0,1,2,3,4,5,6,7,8,9,10,11,12], axis = 0)
        df.to_excel(file, engine="openpyxl", header = False, index = False)
    else:
        os.remove(file)

df = pd.DataFrame() 
for file in list(INPUT_DIR.rglob("*.xls*")):
    df = df.append(pd.read_excel(file), ignore_index=True)
    df.to_excel(r"C:\Users\param\OneDrive\Documents\Automat Consumos\Excels Combinados\Final Sin Etiquetas\EXCEL DEFINITIVO TOTAL.xlsx", engine="openpyxl", index = False)

CodePudding user response:

Given your question refers to a specific part of the code, replacing the append() with concat(). I see you are outputting an excel which is getting overwritten after every iteration this is (probably) a mistake and very inefficient as well. This part of the code:

df = pd.DataFrame() 
for file in list(INPUT_DIR.rglob("*.xls*")):
    df = df.append(pd.read_excel(file), ignore_index=True)
    df.to_excel(r"C:\Users\param\OneDrive\Documents\Automat Consumos\Excels Combinados\Final Sin Etiquetas\EXCEL DEFINITIVO TOTAL.xlsx", engine="openpyxl", index = False)

Can be replaced with:

output = pd.concat([pd.read_excel(x,ignore_index=True) for x in list(INPUT_DIR.rglob("*.xls*")])
output.to_excel("path",engine="openpyxl",index=False)
  • Related