Remove every website ending with .com except one using regex-CodePudding

I'm currently struggling with regex. I'm trying to substitute every website ending with a ".com" except one, that is "crypto.com" as it's not a website per se but also the name of a cryptocurrency.

Let's take this sentence:

"Here are my favorite things: crypto.com, polo.com, cryp.com and google.com"

Inspired by this answer, this is my Python regex:

r"(\w \.)?crypto\.com"

The problem, using https://regex101.com to test it out, is that it's capturing only the crpyto.com, but not the others (which is what I want to do).

Can anyone tell me how to proceed? Thank you!

Expected code:

text = "Here are my favorite things: crypto.com, polo.com, cryp.com and google.com"    
text = re.sub(r"(\w \.)?crypto\.com", '', text )

Expected output:

"Here are my favorite things: crypto.com,, and "

CodePudding user response：

You can use

\s*\b(?!crypto\.)\w \.com\b

See the regex demo. Details:

\s* - zero or more whitespaces
\b - a word boundary
(?!crypto\.) - a negative lookahead that fails the match if there is crypto. string immediately to the right of the current location
\w - one or more word chars
\.com - .com
\b - a word boundary.

See the Python demo:

import re
text = "Here are my favorite things: crypto.com, polo.com, cryp.com and google.com"
print( re.sub(r'\s*\b(?!crypto\.)\w \.com\b', '', text) )
# => Here are my favorite things: crypto.com,, and

A more comprehensive regex can also be used to remove commas and the word and:

(?:\s*(?:,|and\s*)?)\b(?!crypto\.)\w \.com,?

See this regex demo.

CodePudding user response：

Use a negative look-around:

(\w )?(?<!crypto)\.com

Edit: The question changed slightly I removed a \. that was incorrect, now it should work!