Simmilarly to the Topic to Open only files containing dates of past seven days in filename i want to open only those files wich follow a very rigid rule with respect to their naming and extract part of the filename to do a date comparission.
My filename is build like this
{Custom-prefix}_{SupplierName}_{8 digtit_date}.csv
an example:
myprefix_Shop_no24_20221009.csv
so the supplier name can have underscores in them. But each part of the string is devided by underscores as well.
I do have the complete list of for {SupplierName}
but this can change over time and i would like to avoid a soultion that hard codes them. The {SupplierName}
can have numbers in them and they are of various lenght and include "_".
I tired this:
prefix = "Custom-prefix"
pattern = re.compile(fr"(?<={prefix})([a-zA-Z0-9_]*)([0-9]{8})(?=\.csv)")
# I get the filenames via os.walk
matched = pattern.search(filname)
but this seems to mach everything that sits between "CustomPrefix" and ".csv".
pattern = re.compile(fr"(?<={prefix})([a-zA-Z0-9_]*)(?=\.csv)")
Is giving me the exact same result.The way i understand this, i have to make regex aware, that it has to match the individual parts of the string and respect the underscore. so that each group of my filename:
myprefix
_
Shop_no24
_
20221009
.csv
gets recognized. I found a solution to match to underscores in names here but i am unfortunatley not able to get the regex myself and macht the found groups afterwards to do the date comparisson.
Thank you in advance
CodePudding user response:
You can use
pattern = re.compile(fr"{prefix}_(\w*)_(\d{{4}})(\d{{2}})(\d{{2}})\.csv")
Note the double escaped literal braces in the f-string literal.
See the Python demo:
import re
filename = "Custom-prefix_Shop_no24_20221009.csv"
prefix = "Custom-prefix"
pattern = re.compile(fr"{prefix}_(\w*)_(\d{{4}})(\d{{2}})(\d{{2}})\.csv")
matched = pattern.search(filename)
if matched:
supplier, year, month, day = matched.groups()
print(f'supplier={supplier}, year={year}, month={month}, day={day}')
Output:
supplier=Shop_no24, year=2022, month=10, day=09
With (\d{4})(\d{2})(\d{2})
part, you capture all date parts into separate groups so that you can manipulate them however you see fit.