I have a string that resembles the following string:
'My substring1. My substring2: My substring3: My substring4'
Ideally, my aim is to extract 'My substring2' from this string with Python regex. However, I would also be pleased with a result that resembles '. My substring2:'
So far, I am able to extract
'. My substring2: My substring3:'
with
"\.\s.*:"
Alternatively, I have been able to extract - by using Wiktor Stribiżew's solution that deals with a somewhat similar problem posted in How can i extract words from a string before colon and excluding \n from them in python using regex -
'My substring1. My substring2'
specifically with
r'^[^:-][^:]*'
However, I have been unable, after many hours of searching and trying (I am quite new to regex), to combine the two results into a single effective regex expression that will extract 'My substring2' out of my aforementioned string.
I would be eternally greatfull if someone could help me find to correct regex expression to extract 'My substring2'. Thanks!
CodePudding user response:
You can use non-greedy regex (with ?
):
import re
s = "My substring1. My substring2: My substring3: My substring4"
print(re.search(r"\.\s*(.*?):", s).group(1))
Prints:
My substring2
CodePudding user response:
You might for example exclude matching the dot as well, and use a capture group matching any char except the :
^[^:-][^:.]*\.\s*([^:] )
Explanation
^
Start of string[^:-]
The first char can not be either:
or-
[^:.]*
Optionally match any char except:
or.
\.\s*
Match a dot and optional whitespace chars([^:] )
Capture group 1, match 1 chars other than:
Or a bit shorted if there can not be :
.
and -
before matching the dot:
^[^:.-] \.\s*([^:] )
For example
import re
s = "My substring1. My substring2: My substring3: My substring4"
pattern = r"[^:-][^:.]*\.\s*([^:] )"
m = re.match(pattern, s)
if m:
print(m.group(1))
Output
My substring2
CodePudding user response:
With your shown samples please try following regex, code is written and tested in Python3. Here is the Online demo for used regex.
import re
s = "My substring1. My substring2: My substring3: My substring4"
re.findall(r'^.*?\.\s([^:] )(?:(?::\s[^:]*) )$',s)
['My substring2']
Explanation: Using re
module of Python3 here, where I am using re.findall
function of it. Then creating variable named s
which has value as: 'My substring1. My substring2: My substring3: My substring4'
and used regex is: ^.*?\.\s([^:] )(?:(?::\s[^:]*) )$
Explanation of regex: Following is the detailed explanation for above regex.
^.*?\.\s ##Matching from starting of value of variable using lazy match till literal dot followed by space.
([^:] ) ##Creating one and only capturing group which has everything just before : here.
(?: ##Starting a non-capturing group here.
(?: ##Starting 2nd non-capturing group here.
:\s[^:]* ##Matching colon followed by space just before next occurrence of colon here.
) ##Closing 2nd non-capturing group and matching its 1 or more occurrences in variable.
)$ ##Closing first non-capturing group here at end of value.