Home > OS >  Easiest way to clean email address in python
Easiest way to clean email address in python

Time:08-28

I am having issues with emails address and with a small correction, they are can be converted to valid email addresses.

For Ex:

 [email protected], --- Not valid
'[email protected],  --- Not valid
([email protected]),  --- Not valid
([email protected]),  --- Not valid
:[email protected],  --- Not valid
//[email protected]  --- Not valid
[email protected]    ---  valid
...

I could write "if else", but if a new email address comes with new issues, I need to write "ifelse " and update every time.

What is the best way to clean all these small issues, some python packes or regex? PLease suggest.

CodePudding user response:

You can do this (I basically check if the elements in the email are alpha characters or a point, and remove them if not so):

emails = [
    '[email protected]', 
    '([email protected])', 
    '([email protected])',  
    ':[email protected]',  
    '//[email protected]',
    '[email protected]'
    ]

def correct_email_format(email):
    return ''.join(e for e in email if (e.isalnum() or e in ['.', '@']))

for email in emails:
    corrected_email = correct_email_format(email)
    print(corrected_email)

output:

[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

CodePudding user response:

Data clean-up is messy but I found the approach of defining a set of rules to be an easy way to manage this (order of the rules matters):

rules = [
        lambda s: s.replace(' ', ' '),
        lambda s: s.strip(" ,'"),
]

addresses = [
        ' [email protected],',
        '[email protected],'
]

for a in addresses:
    for r in rules:
        a = r(a)
    print(a)

and here is the resulting output:

[email protected]
[email protected]

Make sure you write a test suite that covers both invalid and valid data. It's easy break, and you may be tweaking the rules often.

  • Related