Home > database >  Extract data if between substrings else full string
Extract data if between substrings else full string

Time:06-17

I have string pattern like these:

Beginning through June 18, 2022 at Noon standard time\n
Jan 20, 2022
Beginning through April 26, 2022 at 12:01 a.m. standard time

I want to extract the data part presetnt after "through" and before "at" word using python regex.

June 18, 2022
Jan 20, 2022
April 26, 2022

I can extract for the long text using re group.

s ="Beginning through June 18, 2022 at Noon standard time"
re.search(r'(.*through)(.*) (at.*)', s).group(2)

However it will not work for

s ="June 18, 2022"

Can anyone help me on that.

CodePudding user response:

You may use this regex with a capture group:

(?:.* through |^)(. ?)(?: at |$)

RegEx Demo

RegEx Details:

  • (?:.* through |^): Match anything followed by " though " or start position
  • (. ?): Match 1 of any character and capture it in group #1
  • (?: at |$): Match " at " or end of string

Code:

import re
arr = ['Beginning through June 18, 2022 at Noon standard time',
'Jan 20, 2022',
'Beginning through April 26, 2022 at 12:01 a.m. standard time']

for i in arr:
     print (re.findall(r'(?:.* through |^)(. ?)(?: at |$)', i))

Output:

['June 18, 2022']
['Jan 20, 2022']
['April 26, 2022']

CodePudding user response:

How about playing with optional groups and backtracking.

^(?:.*?through )?(.*?)(?: at.*)?$

See this demo at regex101 or a Python demo at tio.run

Note that if just one of the substrings are present, it will either match from the first to end of the string or from start of string to the latter. If none are present, it will match the full string.


Another idea could be to use PyPI regex which supports branch reset groups.

^(?|.*?through (. ?) at|(. ))

This one extracts the part between if both are present, else the full string. Afaik the regex module is widely compatible to Python's regex functions, just use import regex as re instead.

Demo at regex101 or Python demo at tio.run

  • Related