Replace invalid email address characters using a single regex pattern. Replace "At", "at" with "@", and replace "dot" with "."
Code:
import re
email = "abc at xyz.com, abc At xyz.com, abc (at) xyz [dot] com"
pa = re.compile(r'(\s [\(\[]*\s*at*\s*[\)\]]*\s )',flags=re.IGNORECASE)
em = pa.sub(r'@',email)
print(em)
Output
[email protected], [email protected], abc@xyz [dot] com
Expected output
[email protected], [email protected], [email protected]
How can I replace '[dot]' with '.'?
CodePudding user response:
To replace two or more characters in a string using a single re.sub() function in Python, you can use the character in your regular expression pattern to match one or more occurrences of the characters you want to replace.
Here's an example
import re
email = "abc at xyz.com, abc At
xyz.com, abc (at) xyz [dot] com"
# Define a regular expression
pattern
that matches any whitespace,
# parentheses, or square brackets,
followed by one or more
occurrences
# of the string "at", followed by
any
whitespace, parentheses, or square
# brackets.
pattern = r'(\s [\(\[]*\s*at \s*
[\)\]]*)'
# Compile the pattern and use it
to
replace the matched characters
# with the "@" character using the
re.sub() function.
pa = re.compile(pattern,
flags=re.IGNORECASE)
em = pa.sub(r'@', email)
print(em)
CodePudding user response:
Requiring the substitution to happen with a single pattern just pushes the problem to a different corner. In brief, the second argument to re.sub
can be a function of arbitrary complexity, but then requiring that function to be inlined to a single line seems somewhat disingenuous.
Here, we create a re.sub
which uses a simple dictionary to decide what to replace the match with.
import re
email = "abc at xyz.com, abc At xyz.com, abc (at) xyz [dot] com"
pa = re.compile(r'\W*(at|dot)\W*', flags=re.IGNORECASE)
em = pa.sub(lambda m: {'dot': '.', 'at': '@'}[m.group(1).lower()], email)
print(em)
The main trick is to capture just the dictionary key into the parenthesized subexpression, which is then available in .group(1)
.