I am using python and have a string of email addresses as shown below.
email_addr = '[email protected], [email protected], [email protected]'
Above string looks good, however some time i received the data that have blank email addresses in them.
For e.g.
email_addr = ' , , [email protected], [email protected], , , ,[email protected]
I am using str.split(',')
and lot of errors checking. Wondering if is there a better way to do this?
Final value i am expecting from:
email_addr = ' , , [email protected], [email protected], , , ,[email protected]
to:
email_addr = '[email protected],[email protected],[email protected]'
CodePudding user response:
Try:
import re
email_addr = " , , [email protected], [email protected], , , ,[email protected]"
email_addr = email_addr.replace(" ", "").strip(",")
email_addr = re.sub(r",{2,}", ",", email_addr)
print(email_addr)
Prints:
[email protected],[email protected],[email protected]
CodePudding user response:
No need for regular expressions. Use .split(',')
to split into a list of strings.
email_lst = email_addr.split(',')
Then join with comma, but filter out blank values
email_addr2 = ",".join(e.strip() for e in email_lst if e.strip())
# '[email protected],[email protected],[email protected]'
In Python 3.8 , you can use the walrus operator to avoid calling .strip()
twice:
email_addr2 = ",".join(e for ee in email_lst if (e := ee.strip()))
CodePudding user response:
If we use regex, how about getting a list of matches with [^, ]
and then joining all the items?
[^, ]
means any char except ,
and
, and
means "1 or more"
import re
email_addr = " , , [email protected], [email protected], , , ,[email protected]"
email_cleaned = ",".join(re.findall("[^, ] ", email_addr))
print(email_cleaned)
CodePudding user response:
I'd be quite tempted to validate as you go and rely on email.utils.parseaddr
which will somewhat ensure email clients will accept them
>>> parse_email_addr("Foo Bar <[email protected]>")
('Foo Bar', '[email protected]')
from email.utils import parseaddr as parse_email_addr
email_addr = ' , , [email protected], [email protected], , , ,[email protected]'
result = ",".join(filter(None, (parse_email_addr(email)[1] for email in email_addr.split(","))))
# '[email protected],[email protected],[email protected]'
I'd also be tempted to account for bad fields, which may represent some input error (ie. how did you get these? should they be correct as inputs to your program?)
>>> result
'[email protected],[email protected],[email protected]'
>>> email_addr.rstrip(",").count(",") - result.count(",")
5