Good morning, I have a question, using Webscraping, I extract an information in string format like this:
"Issued May2018No expiration date"
what I want is to split this string into 2 strings by using regular expression, my idea is: whenever you find 4 digits followed by "No", I want to create the following string:
"Issued May2018 - No expiration date".
In this way, I'm able to use the method "split" applied to "-" in a way that I achieve two strings:
- Issued May2018
- No expiration date
I was thinking using regex with
\d\d\d\dNo
and it should be able to recognise 2018No, but I don't know how to proceed in order that I can replace it with
May2018 - No expiration date
and set the floor for using the split function
Any suggestions? other approaches are well suggested
CodePudding user response:
You can use a capture group to capture 4 digits followed by matching No
In the replacement use the capture group 1 value followed by - No
import re
s = "Issued May2018No expiration date"
pattern = r"(\d{4})No "
print(re.sub(pattern, r"\1 - No ", s))
Output
Issued May2018 - No expiration date
See a Python demo and a regex demo.
CodePudding user response:
Use re.sub
.
\g<1>
is represented in the string passed to the repl parameter of re.sub() as the result of a match for reference group 1.
import re
s = "Issued May2018No expiration date"
print(re.sub("(\d{4})(No)", "\g<1> - \g<2>", s))
# 'Issued May2018 - No expiration date'
CodePudding user response:
import re
string = "Issued May2018No expiration date"
m = re.findall(r"^(.*[0-9]{4})(No.*)$", string)
print(m[0][0] " - " m[0][1])
->
Issued May2018 - No expiration date