Home > Blockchain >  Regex end with a character or end of line with lookahead
Regex end with a character or end of line with lookahead

Time:09-19

I have this string

Book Release Date: 2 June, 2010 [Edition#5]

Book Release Date: 24 October, 1996

I want to use a regex to find the date only like follow:

2 June, 2010

24 October, 1996

I have tried using this pattern that is close to what I want

# this pattern result
# 2 June, 2010 [Edition#5]
# 24 October, 1996
date = re.findall(r"(?<=(Book Release Date: ))(.*?)(?=(\[|\n))", text)

# this pattern result
# 2 June, 2010
# None
date = re.findall(r"(?<=(Book Release Date: ))(.*?)(?=\[)", text)

CodePudding user response:

You don't need any lookaround assertions, just a single capture group that will be returned using re.findall

\bBook Release Date: (\d  [A-Z][a-z] , \d{4})\b

Explanation

  • \bBook Release Date:
  • ( Capture group 1
    • \d [A-Z][a-z] Match 1 digits, space, uppercase char A-Z, 1 lowercase chars
    • , \d{4} Match , and 4 digits
  • ) Close group 1
  • \b A word boundary to prevent a partial word match

Regex demo | Python demo

Example

import re
 
pattern = r"\bBook Release Date: (\d  [A-Z][a-z] , \d{4})\b"
 
s = ("Book Release Date: 2 June, 2010 [Edition#5]\n"
    "Book Release Date: 24 October, 1996")
 
print(re.findall(pattern, s))

Output

['2 June, 2010', '24 October, 1996']

CodePudding user response:

Use this :\s(\d )\s(\w ,)\s(\d ), code:

import re

s ='''
Book Release Date: 2 June, 2010 [Edition#5]

Book Release Date: 24 October, 1996

'''

print([' '.join(i) for i in re.findall(':\s(\d )\s(\w ,)\s(\d )', s)])

Output:

['2 June, 2010', '24 October, 1996']
  • Related