I have this string
Book Release Date: 2 June, 2010 [Edition#5]
Book Release Date: 24 October, 1996
I want to use a regex to find the date only like follow:
2 June, 2010
24 October, 1996
I have tried using this pattern that is close to what I want
# this pattern result
# 2 June, 2010 [Edition#5]
# 24 October, 1996
date = re.findall(r"(?<=(Book Release Date: ))(.*?)(?=(\[|\n))", text)
# this pattern result
# 2 June, 2010
# None
date = re.findall(r"(?<=(Book Release Date: ))(.*?)(?=\[)", text)
CodePudding user response:
You don't need any lookaround assertions, just a single capture group that will be returned using re.findall
\bBook Release Date: (\d [A-Z][a-z] , \d{4})\b
Explanation
\bBook Release Date:
(
Capture group 1\d [A-Z][a-z]
Match 1 digits, space, uppercase char A-Z, 1 lowercase chars, \d{4}
Match,
and 4 digits
)
Close group 1\b
A word boundary to prevent a partial word match
Example
import re
pattern = r"\bBook Release Date: (\d [A-Z][a-z] , \d{4})\b"
s = ("Book Release Date: 2 June, 2010 [Edition#5]\n"
"Book Release Date: 24 October, 1996")
print(re.findall(pattern, s))
Output
['2 June, 2010', '24 October, 1996']
CodePudding user response:
Use this :\s(\d )\s(\w ,)\s(\d )
, code:
import re
s ='''
Book Release Date: 2 June, 2010 [Edition#5]
Book Release Date: 24 October, 1996
'''
print([' '.join(i) for i in re.findall(':\s(\d )\s(\w ,)\s(\d )', s)])
Output:
['2 June, 2010', '24 October, 1996']