Home > other >  What manipulations can be done to user emails to prevent duplicates
What manipulations can be done to user emails to prevent duplicates

Time:10-14

I am woking on email based authentication that checks database for existing users based on their email and decides whether to create new account or use existing one.

Issue I came across is that users sometimes use different capitalisation in their emails, append things like 1 in the middle etc...

To combat some of these I am now (1) Stripping whitespaces away from the emails (2) always lowercasing them.

I would like to take this further, but am not sure what else I am allowed to do without breaking some emails i.e.

(3) Can I remove everything after and before @ signs? (4) Can I remove other symbols like . from the emails?

CodePudding user response:

Email addresses are case-insensitive (A and a are treated the same), so changing all upper case to lower case is fine. Digits (0-9) are also valid for emails.

However, you should not remove any of the following characters from an email address:

!#$%&'* -/=?^_`{|}~

Control characters, white space and other specials are invalid.

If you discover characters not in the list of 19 characters above, they would represent an invalid email. How you handle that is undefined, but removing them is probably the best action.

Why removing the is an issue. It is some times used by mail providers to separate (file) inbound email into folders for a user. So jack [email protected] would go to a finance folder in Jack's email. But this is not a rule for all mail providers. So jack [email protected] can be a different account than jack [email protected]. So removing the could conflate different email accounts into an invalid email.

CodePudding user response:

Can I remove everything after and before @ signs? Can I remove other symbols like . from the emails?

Sure, you can - but should you?

If you don't care about standards and want to block valid email addresses, then block any characters you like.

RFC 822 - Standard for ARPA Internet Text Messages and RFC 2822 - Internet Message Format clearly specify the valid characters for email addresses.

is no different to x, ! or $

The local-part (before @) can contain:

  • uppercase and lowercase Latin letters (A-Z, a-z)
  • numeric values (0-9)
  • special characters, such as # ! % $ & * \ = ? ^ _ . { | } ~

...and you can block x, ! or $ or indeed any of them - but again - should you?

See: https://mozilla.fandom.com/wiki/User:Me_at_work/plushaters

  • Related