I am using parse.parse to find a file according to how its name matches with a given pattern. Is it possible to define specific expressions width inside the pattern to add more conditions for the file research ?
Let's suppose the following code :
from parse import parse
patterns = ['file_{city:3w}_{date:6w}', 'file_another_one_{city:3w}_{date:6w}']
def find_and_display_pattern(filename):
print('### searching for file {} ###'.format(filename))
for pattern in patterns:
parse_result = parse(pattern, filename)
if not parse is None:
print('{} pattern found for file {}'.format(pattern, filename))
print('result :')
print(parse_result)
return
find_and_display_pattern('file_PRS_02182022')
find_and_display_pattern('file_another_one_PRS_02182022')
I get the following output :
### searching for file file_PRS_02182022 ###
file_{city:3w}_{date:6w} pattern found for file file_PRS_02182022
result :
<Result () {'city': 'PRS', 'date': '02182022'}>
### searching for file file_another_one_PARIS_02182022 ###
file_{city:3w}_{date:6w} pattern found for file file_another_one_PRS_02182022
result :
<Result () {'city': 'another_one_PRS', 'date': '02182022'}>
My issue for the file 'file_another_one_PRS_02182022' is that I except to retrieve the second pattern : 'file_another_one_{city:3w}_{date:6w}' with the specific expressions width (3 characters for city and 6 characters for date) Which would give the following output :
### searching for file file_another_one_PRS_02182022 ###
file_another_one_{city:3w}_{date:6w} pattern found for file file_another_one_PRS_02182022
result :
<Result () {'city': 'PRS', 'date': '02182022'}>
Can parse.parse handle this ? If not is there any other way to proceed this ?
CodePudding user response:
The patterns you are using do not do what you expect. Take for example the w
type:
Letters, numbers and underscore
This means any number of letters, numbers and underscores. However, with regular expressions, wildcard \w
indicates only one letter, number or underscore: I guess this is the reason you have specified the width in front of w
, but totally unnecessary. Then, about width and precision:
Width specifies a minimum size and precision specifies a maximum
In your case the best option is precision because cities are at most 3 chars long. If you need exactly 3 characters, then use both: 3.3
.
Moreover you made a mistake in the if statement: if not parse is None
should be if parse_result is not None
.
This is an improved version that does what you want and defines a custom parser (see Custom Type Conversion) for the date type:
from parse import parse, with_pattern
from datetime import datetime
@with_pattern(r'\d{8}')
def parse_date(text):
return datetime.strptime(text, '%m%d%Y').date()
patterns = ['file_{city:3.3}_{date:Date}', 'file_another_one_{city:3.3}_{date:Date}']
def find_and_display_pattern(filename):
print('### searching for file {} ###'.format(filename))
for pattern in patterns:
parse_result = parse(pattern, filename, dict(Date=parse_date))
if parse_result is not None:
print('{} pattern found for file {}'.format(pattern, filename))
print('result :')
print(parse_result)
return
find_and_display_pattern('file_PRS_02182022')
find_and_display_pattern('file_another_one_PRS_02182022')
Output:
### searching for file file_PRS_02182022 ###
file_{city:3.3}_{date:Date} pattern found for file file_PRS_02182022
result :
<Result () {'city': 'PRS', 'date': datetime.date(2022, 2, 18)}>
### searching for file file_another_one_PRS_02182022 ###
file_another_one_{city:3.3}_{date:Date} pattern found for file file_another_one_PRS_02182022
result :
<Result () {'city': 'PRS', 'date': datetime.date(2022, 2, 18)}>