I'm currently struggling with regex. I'm trying to substitute every website ending with a ".com" except one, that is "crypto.com" as it's not a website per se but also the name of a cryptocurrency.
Let's take this sentence:
"Here are my favorite things: crypto.com, polo.com, cryp.com and google.com"
Inspired by this answer, this is my Python regex:
r"(\w \.)?crypto\.com"
The problem, using https://regex101.com to test it out, is that it's capturing only the crpyto.com, but not the others (which is what I want to do).
Can anyone tell me how to proceed? Thank you!
Expected code:
text = "Here are my favorite things: crypto.com, polo.com, cryp.com and google.com"
text = re.sub(r"(\w \.)?crypto\.com", '', text )
Expected output:
"Here are my favorite things: crypto.com,, and "
CodePudding user response:
You can use
\s*\b(?!crypto\.)\w \.com\b
See the regex demo. Details:
\s*
- zero or more whitespaces\b
- a word boundary(?!crypto\.)
- a negative lookahead that fails the match if there iscrypto.
string immediately to the right of the current location\w
- one or more word chars\.com
-.com
\b
- a word boundary.
See the Python demo:
import re
text = "Here are my favorite things: crypto.com, polo.com, cryp.com and google.com"
print( re.sub(r'\s*\b(?!crypto\.)\w \.com\b', '', text) )
# => Here are my favorite things: crypto.com,, and
A more comprehensive regex can also be used to remove commas and the word and
:
(?:\s*(?:,|and\s*)?)\b(?!crypto\.)\w \.com,?
See this regex demo.
CodePudding user response:
Use a negative look-around:
(\w )?(?<!crypto)\.com
Edit: The question changed slightly
I removed a \.
that was incorrect, now it should work!