Hi I'm a newbie in Python and in coding in general. this is my very first post.
I am trying to open and concatenate the last 20 files into a dataframe.
I am succesuful in doing so when i am working with a test folder that contain only 100 files, but as soon as i try my code in the real folder that contain 10k files my code is very slow and take like 5 minutes to finish.
Here is my try :
import pandas as pd
import glob
from datetime import datetime
import numpy as np
import os
path = r'K:/industriel/abc/03_LOG/PRODUCTION/CSV/'
path2 = r'K:/industriel/abc/03_LOG/PRODUCTION/IMG/'
os.chdir(path)
files = glob.glob(path "/*.csv")
#files = filter(os.path.isfile, os.listdir(path))
files = [os.path.join(path, f) for f in files]
files.sort(key=lambda x: os.path.getctime(x), reverse=False)
dfs = pd.DataFrame()
for i in range(20):
dfs = dfs.append(pd.read_csv(files[i].split('\\')[-1],delimiter=';', usecols=[0,1,3,4,9,10,20]))
dfs = dfs.reset_index(drop=True)
print(dfs.head(10))
CodePudding user response:
Try reading all the individual files to a list
and then concat
to form your dataframe at the end:
files = [os.path.join(path, f) for f in os.listdir(path) if f.endswith(".csv")]
files.sort(key=lambda x: os.path.getctime(x), reverse=False)
dfs = list()
for i, file in enumerate(files[:20]):
dfs.append(pd.read_csv(file, delimiter=';', usecols=[0,1,3,4,9,10,20]))
dfs = pd.concat(dfs)
CodePudding user response:
You can use pd.concat() with a list of read files. You can replace your code after files.sort(...) with the following
dfs = pd.concat([
pd.read_csv(files[i].split('\\')[-1], delimiter=';', usecols=[0,1,3,4,9,10,20])
for file in files[20:]
])
dfs = dfs.reset_index(drop=True)
print(dfs.head(10))