Home > Software engineering >  Is there a neat way to extract the dates from this list of strings where some don't have a date
Is there a neat way to extract the dates from this list of strings where some don't have a date

Time:11-01

I'm trying to make a quick script to streamline some boring accounting. Basically I have a folder full of files with names similar to what is contained in the list below.

I need to rename the files as indicated in the first couple of file names.

I have a clear idea about how to do this, and was writting a quick script to get it done. But I hit on a bit of a silly problem. I want to uses a list comprehension to get a list of the dates, sorta as illustrated in the last line. Ideally what I want to do would be would be:

[re.search(date_pattern, file).match for file in list_of_reciepts]

But this fails on filenames which are missing a date field.

Any thoughts on a nice neat alternative?

import re

list_of_reciepts = [
   '2021-10-18 1.pdf',
   '2021-10-18 2.pdf',
   '2021-10-18 3.pdf',
   'Financial History - Linkt.pdf',
   'Scan from 2021-10-04 05_14_16 PM.pdf',
   'Scan from 2021-10-07 11_41_26 AM.pdf',
   'Scan from 2021-10-19 05_13_22 PM.pdf',
]

date_pattern = re.compile(r'\d{4}-\d{2}-\d{2}')

[re.search(date_pattern, file) for file in list_of_reciepts]
>>>[<re.Match object; span=(0, 10), match='2021-10-18'>,
    <re.Match object; span=(0, 10), match='2021-10-18'>,
    <re.Match object; span=(0, 10), match='2021-10-18'>,
    None,
    <re.Match object; span=(10, 20), match='2021-10-04'>,
    <re.Match object; span=(10, 20), match='2021-10-07'>,
    <re.Match object; span=(10, 20), match='2021-10-19'>]

CodePudding user response:

If you use Python >= 3.8, you can use walrus operator:

>>> [sre.group() for file in list_of_reciepts
         if (sre := re.search(date_pattern, file))]

['2021-10-18',
 '2021-10-18',
 '2021-10-18',
 '2021-10-04',
 '2021-10-07',
 '2021-10-19']

For Python < 3.8, use a double comprehension:

>>> [sre.group() for sre in [re.search(date_pattern, file)
         for file in list_of_reciepts] if sre]
['2021-10-18',
 '2021-10-18',
 '2021-10-18',
 '2021-10-04',
 '2021-10-07',
 '2021-10-19']

If you want to keep None:

>>> [sre.group() if (sre := re.search(date_pattern, file)) else None
         for file in list_of_reciepts]
['2021-10-18',
 '2021-10-18',
 '2021-10-18',
 None,
 '2021-10-04',
 '2021-10-07',
 '2021-10-19']

CodePudding user response:

Use the walrus operator

res = [x.group() if (x := re.search(date_pattern, file)) else None for file in list_of_reciepts]
print(res)

Output

['2021-10-18', '2021-10-18', '2021-10-18', None, '2021-10-04', '2021-10-07', '2021-10-19']

As an alternative, since you are compiling the regular expression, you could use map as below:

res = [match.group() if match else match for match in map(date_pattern.search, list_of_reciepts)]

CodePudding user response:

You can use getattr for a cleaner and shorter approach than using an assignment expression with a conditional:

import re
v = ['2021-10-18 1.pdf', '2021-10-18 2.pdf', '2021-10-18 3.pdf', 'Financial History - Linkt.pdf', 'Scan from 2021-10-04 05_14_16 PM.pdf', 'Scan from 2021-10-07 11_41_26 AM.pdf', 'Scan from 2021-10-19 05_13_22 PM.pdf']
p = re.compile(r'\d{4}-\d{2}-\d{2}')
r = [getattr(re.search(p, i), 'group', lambda :None)() for i in v]

Output:

['2021-10-18', '2021-10-18', '2021-10-18', None, '2021-10-04', '2021-10-07', '2021-10-19']
  • Related