Ignoring Part of Filename Using pd.read

I am very new to Python and Pandas and am looking for some help.

I have written Python code in order to read a monthly Excel file, manipulate the data and append it to another file. My issue is that every month the original file is renamed to match the year and date it is generated, for example last month's was "2022 McCarthy LEM_2022 Financial Reporting_20220329."

Is there a way to use pd.read_excel and have it pick up any .xlsx file name in that folder that contains "McCarthy" "LEM" or "Financial Reporting?"

I am currently just using the simple code of:

    pd.read_excel("2022 McCarthy LEM_2022 Financial 
    Reporting_20220329.xlsx",  sheet_name = "2022 Billable Work Order 
    Data")

Or is there a different approach I should take to remedy this issue?

Ideally, I would receive the report in an email, save the report to a folder, and then would like the start the Python script without having to change the name of the file in the code.

Thank you for any ideas and help!

CodePudding user response：

You can use something like glob which is included in the python standard library.

import glob

substring_list = ["McCarthy", "LEM", "Financial Reporting"]

file_lists = glob.glob("path/to/directory/*.xlsx")
for file in file_lists:

    if any(substring in file for substring in substring_list):
      
        pd.read_excel(file,  sheet_name = "2022 Billable Work Order Data")

I have used the more basic method of checking if the substring is in the string, rather than regex, however glob can also accept regex so you can do it that way. However, if you are just interested in a particular group of substrings then this might be easier for you to understand and change if you need to.