Having these two types of string:
1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip
1635508858063-1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip
How can I get using regex the 111040
part of the string? It has always 6 digits.
My approach is: "Take the 6 digit code after the YYYY_MM_DD_HH_MM_SS_ part", but any other approach is also welcome.
EDIT: The last part _0CM.csv.zip
can be suceptible to change.
Thanks in advance.
CodePudding user response:
You wanted a regex so here it is:
[0-9]{4}(?:_[0-9]{2}){5}_([0-9]{6})
[0-9]{4}
: match the first 4 digits of the year, this is our starting anchor(?:_[0-9]{2}){5}
: after that, it follows with 5 two digit numbers (month, day, hour, minute, second) so we can just group them all and ignore them([0-9]{6})
: get the 6 digits following the previous expression.
The desired number is in capture group 1 of this regex:
import re
regex = '[0-9]{4}(?:_[0-9]{2}){5}_([0-9]{6})'
re.search(regex, '1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip').group(1)
CodePudding user response:
How about this pattern? Works if you match each line one-by-line:
import re
pattern = re.compile('\d{4}_\d{2}_\d{2}_\d{2}_\d{2}_\d{2}_(\d{6})')
print(pattern.findall("1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip"))
CodePudding user response:
This will return '' if an appropriate match isn't found.
import re
strings = [
"1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip",
"1635508858063-1625212673449-2021_07_02_07_55_05_111040_0CM.csv.zip",
'Test'
]
pattern = re.compile('_(\d{6})_')
digits = [pattern.search(string).group(1) if pattern.search(string) else '' for string in strings]
print(digits)