Home > Back-end >  Regex to extract a string after a date in Python
Regex to extract a string after a date in Python

Time:11-03

Having these two types of string:

1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip

1635508858063-1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip

How can I get using regex the 111040 part of the string? It has always 6 digits.

My approach is: "Take the 6 digit code after the YYYY_MM_DD_HH_MM_SS_ part", but any other approach is also welcome.

EDIT: The last part _0CM.csv.zip can be suceptible to change.

Thanks in advance.

CodePudding user response:

You wanted a regex so here it is:

[0-9]{4}(?:_[0-9]{2}){5}_([0-9]{6})
  • [0-9]{4}: match the first 4 digits of the year, this is our starting anchor
  • (?:_[0-9]{2}){5}: after that, it follows with 5 two digit numbers (month, day, hour, minute, second) so we can just group them all and ignore them
  • ([0-9]{6}): get the 6 digits following the previous expression.

The desired number is in capture group 1 of this regex:

import re
regex = '[0-9]{4}(?:_[0-9]{2}){5}_([0-9]{6})'
re.search(regex, '1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip').group(1)

CodePudding user response:

How about this pattern? Works if you match each line one-by-line:

import re
pattern = re.compile('\d{4}_\d{2}_\d{2}_\d{2}_\d{2}_\d{2}_(\d{6})')
print(pattern.findall("1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip"))

CodePudding user response:

This will return '' if an appropriate match isn't found.

import re

strings = [
    "1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip",
    "1635508858063-1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip",
    'Test'
]

pattern = re.compile('_(\d{6})_')

digits = [pattern.search(string).group(1) if pattern.search(string) else '' for string in strings]

print(digits)
  • Related