Home > Mobile >  Open certain amount of files by glob
Open certain amount of files by glob

Time:10-14

I'm trying to use globto open excel file in one folder and then concat them into 1 file but it takes quite a long time to open all files and then concat like that (each file contents around 20000 rows).

So I would like to ask is there anyway to open certain amount of files using glob? Ex: Recent 30 files in all files. Or is there another way to make it

Thanks and best regards

CodePudding user response:

Or is there another way to make it

I generally deal with this by using the os method listdir to list all available files in a given directory (e.g. path_to_files), then open them using the pandas read_csv or read_excel method and append them to a list_of_dataframes to concatenate:

import os 
import pandas as pd
from pathlib import Path

path_to_files = Path('...') #The path to the folder containing your excel files

list_of_dataframes = []
for myfile in os.listdir(path_to_files):
     pathtomyfile = path_to_files / myfile
     list_of_dataframes.append(pd.read_csv(pathtomyfile)) 

df=pd.concat(list_of_dataframes)

The number of files to load can be specified by indexing, e.g. for the last 30 files:

for myfile in os.listdir(path_to_files)[-30:]

CodePudding user response:

I've been using this code for merging mp4 files but of course you could use this with other files

from glob import iglob

def enumerate_files(source: str, ext: str = "*") -> Iterator[str]:
   """Enumerate and scan the files from folder by given extension"""
   return iglob(os.path.join(source, f"*.{ext}"))

def combine_files(source: str, output: str,ext: str = "mp4") -> None:
   folder = join(os.getcwd(), source)
   with open(output, "wb") as wb:
      for file in enumerate_files(folder, ext):
          with open(file, "rb") as fd:
              shutil.copyfileobj(fd, wb)

CodePudding user response:

If there is a certain naming convention you could select files that only meet a certain requirement? as:

list_of_files = glob.glob('*criteria*)

or you could get the list and then split the list of files into n subsets?

list_of_files = glob.glob('*xlx')
length = len(listfiles)
center = length//2

a = listfiles[:center]
b = listfiles[center:]
  • Related