Home > database >  Python : Extract mails from the string of filenames
Python : Extract mails from the string of filenames

Time:05-12

I want to get the mail from the filenames. Here is a set of examples of filenames :

string1 = "[email protected]_2022-05-11T11_59_58 00_00.pdf"
string2 = "[email protected]_test.pdf"
string3 = "[email protected]"

I would like to split the filename by the parts. The first one would contain the email and the second one is the rest. So it should give for the string2 :

['[email protected]', '_test.pdf']

I try this regex function however it does not work for the second and third string.

email = re.search(r"[a-z0-9\.\- _] @[a-z0-9\.\- _] \.[a-z] ", string)

Thank you for your help

CodePudding user response:

Given the samples you provided, you can do something like this:

import re

strings = ["[email protected]_2022-05-11T11_59_58 00_00.pdf",
           "[email protected]_test.pdf",
           "[email protected]"]

pattern = r'([^@] @[\.A-Za-z] )(.*)'

[re.findall(pattern, string)[0] for string in strings]

Output:

[('[email protected]', '_2022-05-11T11_59_58 00_00.pdf'),
 ('[email protected]', '_test.pdf'),
 ('[email protected]', '-fdsdfsd-saf.pdf')]
    

Mail pattern explanation ([^@] @[\.A-Za-z] ):

  • [^@] : any combination of characters except @
  • @: at
  • [\.A-Za-z] : any combination of letters and dots

Rest pattern explanation (.*)

  • (.*): any combination of characters
  • Related