Home > OS >  How to normalize email addresses without regex
How to normalize email addresses without regex

Time:12-01

The problem is that no matter what I do, I cannot seem to make a function that normalizes (not validates) without the help of regex. For example, instead of my code printing invalid or valid email address, I want it to:

  1. filter out signs, . signs, etc. that are BEFORE the @gmail.com part;
  2. make it so that the program doesn't discriminate between capitalized and uncapitalized letters (return the same thing).

Here are the current email addresses I'm trying to filter:

johnsmith [email protected]
# should return [email protected]
[email protected]
# should return [email protected]

Here's what I worked on so far:

def normalizeEmail(emailIn):
    if emailIn != regex:
        ch_1 = ' '
        ch_2 = '.'
        new_emailIn = (emailIn.lower().split(ch_1, 1)[0]).replace()   '@gmail.com'
        if emailIn.endswith('@gmail.com'):
            return new_emailIn 
        if new_emailIn != new_emailIn.endswith('gmailcom'):
            return (new_emailIn.lower().split(ch_2, 1)[0]) 

if __name__ == "__main__":
    print(normalizeEmail('johnsmith [email protected]'))
    print(normalizeEmail('[email protected]'))

The . needs to be replaced, and anything after the should be removed.

Earlier, I tried it with regex, but it never seemed to properly normalize the emails and would return my customized exception error:

Invalid Email Address

With regex, I tried:

import re
    def normalizeEmail(emailIn)
    regex  = '/^[a-zA-Z0-9.!#$%&’* /=?^_`{|}~-] @[a-zA-Z0-9-] (?:\.[a-zA-Z0-9-] )*$/'
    if emailIn == regex: 
      return 'emailIn'
    if emailIn != regex
      return 'Invalid Email Address'

Output:

Invalid Email Address 

I've been cracking at this program for awhile, and it's really frustrating me that I cannot crack it.

Mind you, the function cannot be too specific because it needs to normalize a dictionary of 35 other email addresses. Then, I'll sum down the list to the normalized emails and create a list of them.

Please help. This is my first 3 months of programming and I'm already dying.

I tried regex and expected the expected email format to be returned, but it wasn't, so I am seeking a solution without regex.

CodePudding user response:

This seems like a trivial task for the str class methods:

addresses = [
    'johnsmith [email protected]',
    '[email protected]'
]

for address in addresses:
    name, domain = map(str.lower, address.split('@'))
    if domain == 'gmail.com':
        name = name.replace('.', '')
        if ' ' in name:
            name, tag = name.split(' ', 1)
    print(f'{name}@{domain}')

The code above will result in the following output:

[email protected]
[email protected]

Or, if you want it as a function:

def normalize_address(address):
    name, domain = map(str.lower, address.split('@'))
    if domain == 'gmail.com':
        name = name.replace('.', '')
        if ' ' in name:
            name, tag = name.split(' ', 1)
    return f'{name}@{domain}'


addresses = [
    'johnsmith [email protected]',
    '[email protected]'
]

for address in addresses:
    print(normalize_address(address))
  • Related