Working with regular expression is always a nightmare (to me!). I don't have clearly understand the difference between greedy and ungreedy search and how to activate one or the other :(
I have a CSV file like this:
row1, value2, value3, value4
row2, value2, value3, value4
"Main string", "secondary string", "[email protected], [email protected], [email protected],name4.surname4@whatever-host" "footer"
"ABC", "ABC", "[email protected]" "last-row-value"
row5, value2, value3, value4
"Main string", "secondary string", "[email protected]" "last-row-value"
"[email protected], [email protected]", 2, 3, 4
row999, value2, value3, value4
My goal (with regexpr) is to extract only the (full) string on rows containing "[email protected]":
"[email protected], [email protected], [email protected],name4.surname4@whatever-host"
"[email protected]"
"[email protected]"
"[email protected], [email protected]"
I've started from this regexpr...
(").*(stefano\.test@whatever-host\.com).*(")
...but it gets me the full (original) string. Which regexpr operator should I use?
CodePudding user response:
You can try this if it is fine for you
import csv,re
with open('test1', newline='') as f:
for line in csv.reader(f, quotechar='"', delimiter=' ',
quoting=csv.QUOTE_ALL, skipinitialspace=True):
# print(l)
for data in line:
match = re.search('@whatever-host.com',data)
if match:
print(data)
Output:
[email protected], [email protected], [email protected],name4.surname4@whatever-host
[email protected]
[email protected]
[email protected], [email protected],