Easiest way to clean email address in python-CodePudding

I am having issues with emails address and with a small correction, they are can be converted to valid email addresses.

For Ex:

 [email protected], --- Not valid
'[email protected],  --- Not valid
([email protected]),  --- Not valid
([email protected]),  --- Not valid
:[email protected],  --- Not valid
//[email protected]  --- Not valid
[email protected]    ---  valid
...

I could write "if else", but if a new email address comes with new issues, I need to write "ifelse " and update every time.

What is the best way to clean all these small issues, some python packes or regex? PLease suggest.

CodePudding user response：

You can do this (I basically check if the elements in the email are alpha characters or a point, and remove them if not so):

emails = [
    '[email protected]', 
    '([email protected])', 
    '([email protected])',  
    ':[email protected]',  
    '//[email protected]',
    '[email protected]'
    ]

def correct_email_format(email):
    return ''.join(e for e in email if (e.isalnum() or e in ['.', '@']))

for email in emails:
    corrected_email = correct_email_format(email)
    print(corrected_email)

output:

[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

CodePudding user response：

Data clean-up is messy but I found the approach of defining a set of rules to be an easy way to manage this (order of the rules matters):

rules = [
        lambda s: s.replace(' ', ' '),
        lambda s: s.strip(" ,'"),
]

addresses = [
        ' [email protected],',
        '[email protected],'
]

for a in addresses:
    for r in rules:
        a = r(a)
    print(a)

and here is the resulting output:

[email protected]
[email protected]

Make sure you write a test suite that covers both invalid and valid data. It's easy break, and you may be tweaking the rules often.