I'm trying to use glob
to open excel file in one folder and then concat
them into 1 file but it takes quite a long time to open all files and then concat like that (each file contents around 20000 rows).
So I would like to ask is there anyway to open certain amount of files using glob? Ex: Recent 30 files in all files. Or is there another way to make it
Thanks and best regards
CodePudding user response:
Or is there another way to make it
I generally deal with this by using the os method listdir
to list all available files in a given directory (e.g. path_to_files
), then open them using the pandas read_csv
or read_excel
method and append them to a list_of_dataframes
to concatenate:
import os
import pandas as pd
from pathlib import Path
path_to_files = Path('...') #The path to the folder containing your excel files
list_of_dataframes = []
for myfile in os.listdir(path_to_files):
pathtomyfile = path_to_files / myfile
list_of_dataframes.append(pd.read_csv(pathtomyfile))
df=pd.concat(list_of_dataframes)
The number of files to load can be specified by indexing, e.g. for the last 30 files:
for myfile in os.listdir(path_to_files)[-30:]
CodePudding user response:
I've been using this code for merging mp4 files but of course you could use this with other files
from glob import iglob
def enumerate_files(source: str, ext: str = "*") -> Iterator[str]:
"""Enumerate and scan the files from folder by given extension"""
return iglob(os.path.join(source, f"*.{ext}"))
def combine_files(source: str, output: str,ext: str = "mp4") -> None:
folder = join(os.getcwd(), source)
with open(output, "wb") as wb:
for file in enumerate_files(folder, ext):
with open(file, "rb") as fd:
shutil.copyfileobj(fd, wb)
CodePudding user response:
If there is a certain naming convention you could select files that only meet a certain requirement? as:
list_of_files = glob.glob('*criteria*)
or you could get the list and then split the list of files into n subsets?
list_of_files = glob.glob('*xlx')
length = len(listfiles)
center = length//2
a = listfiles[:center]
b = listfiles[center:]