Home > database >  Replace two or more character in a string using single pattern sub function in regular expression py
Replace two or more character in a string using single pattern sub function in regular expression py

Time:12-06

Replace invalid email address characters using a single regex pattern. Replace "At", "at" with "@", and replace "dot" with "."

Code:

import re

email = "abc at xyz.com, abc At xyz.com, abc (at) xyz [dot] com"
pa = re.compile(r'(\s [\(\[]*\s*at*\s*[\)\]]*\s )',flags=re.IGNORECASE)
em = pa.sub(r'@',email)
print(em)

Output

[email protected], [email protected], abc@xyz [dot] com

Expected output

[email protected], [email protected], [email protected]

How can I replace '[dot]' with '.'?

CodePudding user response:

To replace two or more characters in a string using a single re.sub() function in Python, you can use the character in your regular expression pattern to match one or more occurrences of the characters you want to replace.

Here's an example

import re

email = "abc at xyz.com, abc At 
xyz.com, abc (at) xyz [dot] com"

# Define a regular expression 
pattern 
that matches any whitespace, 
# parentheses, or square brackets, 
followed by one or more 
occurrences 
# of the string "at", followed by 
any 
whitespace, parentheses, or square 
# brackets.
pattern = r'(\s [\(\[]*\s*at \s* 
[\)\]]*)'

# Compile the pattern and use it 
to 
replace the matched characters 
# with the "@" character using the 
re.sub() function.
pa = re.compile(pattern, 
flags=re.IGNORECASE)
em = pa.sub(r'@', email)

print(em)

CodePudding user response:

Requiring the substitution to happen with a single pattern just pushes the problem to a different corner. In brief, the second argument to re.sub can be a function of arbitrary complexity, but then requiring that function to be inlined to a single line seems somewhat disingenuous.

Here, we create a re.sub which uses a simple dictionary to decide what to replace the match with.

import re

email = "abc at xyz.com, abc At xyz.com, abc (at) xyz [dot] com"
pa = re.compile(r'\W*(at|dot)\W*', flags=re.IGNORECASE)
em = pa.sub(lambda m: {'dot': '.', 'at': '@'}[m.group(1).lower()], email)
print(em)

The main trick is to capture just the dictionary key into the parenthesized subexpression, which is then available in .group(1).

Demo: https://ideone.com/3Llu0i

  • Related