Home > Blockchain >  regexpr: extract part of a string containing a specific e-mail address
regexpr: extract part of a string containing a specific e-mail address

Time:07-28

Working with regular expression is always a nightmare (to me!). I don't have clearly understand the difference between greedy and ungreedy search and how to activate one or the other :(

I have a CSV file like this:

row1, value2, value3, value4
row2, value2, value3, value4
"Main string", "secondary string", "[email protected], [email protected], [email protected],name4.surname4@whatever-host" "footer"
"ABC", "ABC", "[email protected]"  "last-row-value"
row5, value2, value3, value4
"Main string", "secondary string", "[email protected]"  "last-row-value"
"[email protected], [email protected]", 2, 3, 4
row999, value2, value3, value4

My goal (with regexpr) is to extract only the (full) string on rows containing "[email protected]":

"[email protected], [email protected], [email protected],name4.surname4@whatever-host"
"[email protected]"
"[email protected]"
"[email protected], [email protected]"

I've started from this regexpr...

(").*(stefano\.test@whatever-host\.com).*(")

...but it gets me the full (original) string. Which regexpr operator should I use?

CodePudding user response:

You can try this if it is fine for you

import csv,re
with open('test1', newline='') as f:

    for line in  csv.reader(f, quotechar='"', delimiter=' ',
                         quoting=csv.QUOTE_ALL, skipinitialspace=True):
        # print(l)
        for data in line:
            match = re.search('@whatever-host.com',data)
            if match:
                print(data)

Output:

[email protected], [email protected], [email protected],name4.surname4@whatever-host
[email protected]
[email protected]
[email protected], [email protected],
  • Related