Home > Enterprise >  parse.parse, specify expressions width
parse.parse, specify expressions width

Time:02-19

I am using parse.parse to find a file according to how its name matches with a given pattern. Is it possible to define specific expressions width inside the pattern to add more conditions for the file research ?

Let's suppose the following code :

from parse import parse

patterns = ['file_{city:3w}_{date:6w}', 'file_another_one_{city:3w}_{date:6w}']

def find_and_display_pattern(filename):
    print('### searching for file {} ###'.format(filename))
    for pattern in patterns:
        parse_result = parse(pattern, filename)
        if not parse is None:
            print('{} pattern found for file {}'.format(pattern, filename))
            print('result :')
            print(parse_result)
            return

find_and_display_pattern('file_PRS_02182022')
find_and_display_pattern('file_another_one_PRS_02182022')

I get the following output :

### searching for file file_PRS_02182022 ###
file_{city:3w}_{date:6w} pattern found for file file_PRS_02182022
result :
<Result () {'city': 'PRS', 'date': '02182022'}>
### searching for file file_another_one_PARIS_02182022 ###
file_{city:3w}_{date:6w} pattern found for file file_another_one_PRS_02182022
result :
<Result () {'city': 'another_one_PRS', 'date': '02182022'}>

My issue for the file 'file_another_one_PRS_02182022' is that I except to retrieve the second pattern : 'file_another_one_{city:3w}_{date:6w}' with the specific expressions width (3 characters for city and 6 characters for date) Which would give the following output :

### searching for file file_another_one_PRS_02182022 ###
file_another_one_{city:3w}_{date:6w} pattern found for file file_another_one_PRS_02182022
result :
<Result () {'city': 'PRS', 'date': '02182022'}>

Can parse.parse handle this ? If not is there any other way to proceed this ?

CodePudding user response:

The patterns you are using do not do what you expect. Take for example the w type:

Letters, numbers and underscore

This means any number of letters, numbers and underscores. However, with regular expressions, wildcard \w indicates only one letter, number or underscore: I guess this is the reason you have specified the width in front of w, but totally unnecessary. Then, about width and precision:

Width specifies a minimum size and precision specifies a maximum

In your case the best option is precision because cities are at most 3 chars long. If you need exactly 3 characters, then use both: 3.3.

Moreover you made a mistake in the if statement: if not parse is None should be if parse_result is not None.


This is an improved version that does what you want and defines a custom parser (see Custom Type Conversion) for the date type:

from parse import parse, with_pattern
from datetime import datetime

@with_pattern(r'\d{8}')
def parse_date(text):
    return datetime.strptime(text, '%m%d%Y').date()

patterns = ['file_{city:3.3}_{date:Date}', 'file_another_one_{city:3.3}_{date:Date}']

def find_and_display_pattern(filename):
    print('### searching for file {} ###'.format(filename))
    for pattern in patterns:
        parse_result = parse(pattern, filename, dict(Date=parse_date))
        if parse_result is not None:
            print('{} pattern found for file {}'.format(pattern, filename))
            print('result :')
            print(parse_result)
            return

find_and_display_pattern('file_PRS_02182022')
find_and_display_pattern('file_another_one_PRS_02182022')

Output:

### searching for file file_PRS_02182022 ###
file_{city:3.3}_{date:Date} pattern found for file file_PRS_02182022
result :
<Result () {'city': 'PRS', 'date': datetime.date(2022, 2, 18)}>
### searching for file file_another_one_PRS_02182022 ###
file_another_one_{city:3.3}_{date:Date} pattern found for file file_another_one_PRS_02182022
result :
<Result () {'city': 'PRS', 'date': datetime.date(2022, 2, 18)}>
  • Related