Home > database >  Read in a csv file using wildcards
Read in a csv file using wildcards

Time:08-06

I need to read in a csv file daily but certain numbers in the file name will change each day. The filename with directory included is C:\siglocal\pairoffs\\logs_20220804_084056_9500_capped_delta_for_singlestockdelta.csv

I have tried the below where I enter an asterisk after the _08 on the first row of the file path here. There are 9 digits after this part of the file name that change daily and then the last part of the file name (_capped_delta_for_singlestockdelta.csv) stays the same.

Any ideas what I need to do here?

df = pd.read_csv(r'C:\siglocal\pairoffs\\logs_20220804_08*'   '_capped_delta_for_singlestockdelta.csv')

CodePudding user response:

I do not see how this is a pandas problem. If I understand correctly you are looking for a possibility to build a string with variables. Here you can use the .format() statements:

r'C:\siglocal\pairoffs\\logs_20220804_08{0}_capped_delta_for_singlestockdelta.csv'.format(day)

CodePudding user response:

Perhaps use os.walk(...) and a regular expression to evaluate the files in the folder. Here's one possible implementation:

import os
import re


# define the folder where the files are located
src_folder = r"C:\_temp"

# define the regular expression to filter the files
file_regex = "logs_20220804_08([0-9][0-9][0-9][0-9]_[09][0-9][0-9][0-9])" \
               "_capped_delta_for_singlestockdelta.csv"

for dir_path, dir_names, file_names in os.walk(src_folder):
    # Each iteration contains:
    # dir_path - current folder for the iteration
    # dir_names - list of folders in the dir_path.
    # file_names - list of files in the dir_path.
    for file_name in file_names:
        print("Evaluating file({}) in folder({})"
              .format(file_name, dir_path))
        match_obj = re.match(file_regex, file_name, re.M | re.I)
    
        # match_obj will be None if there isn't a match
        if match_obj:
            print("{}File({}) matches our regular expression."
                  .format(" " * 5, file_name))
            print("{}Changing number value is: {}"
              .format(" " * 5, match_obj.group(1)))
        else:
            print("{}No match for file ({})"
                  .format(" " * 5, file_name))
  • Related