Home > Back-end >  Multiple occurrences of match in row with different prefix and multiline
Multiple occurrences of match in row with different prefix and multiline

Time:06-25

I am not sure, if regex supports this. I want to extract all the mail addresses from the "TO:" line only. This is the given string:

Content-Type: application/ms-tnef; name="winmail.dat"
Content-Transfer-Encoding: binary
From: Max Mustermann <[email protected]>
To: autorouter.test <[email protected]>, Max Mustermann<[email protected]>
CC: Max Mustermann <[email protected]>, Max Mustermann<[email protected]>
Subject: Subject-Foobar
Thread-Topic: Subject-Foobar
Thread-Index: AdiHB4KcplQHHfCjQW 1j4r7qtj8wg==
Date: Thu, 23 Jun 2022 15:51:03  0200
Message-ID: <[email protected]>
Accept-Language: de-DE, en-US
Content-Language: de-DE
X-MS-Has-Attach:
X-MS-Exchange-Organization-SCL: -1
X-MS-TNEF-Correlator: <[email protected]>

I can select all mail addresses with "<.*>", but not if I try to restrict it to the lines starting with "To:".

This would be the desired output:

  1. vr.test@foo-gruppe
  2. [email protected]

Is this possible?

CodePudding user response:

In Java you can use a finite quantifier in a positive lookbehind assertion:

(?<=^To:.{0,1000})<[^@<>] @[^@<>] >

Explanation

  • (?<= Assert what is to the left is
    • ^ Start of string
    • To: Match literally
    • .{0,1000} Optionally repeat any character except a newline 0 - 1000 times (change it accordingly)
  • ) Close the lookbehind
  • < Match the opening <
  • [^@<>]*@[^@<>]* Match an @ char between 1 chars to the left and right other than < and >
  • > Match the closing >

See a regex demo


You might also capture what is in between the angle brackets, and if the @ part in not nessecary only use the negated character class:

(?<=^To:.{0,1000})<([^<>]*)>

Regex demo

CodePudding user response:

For that, I believe that you to use the 'positive lookbehind', like this:

(?<=To:.*?)([\w.-] @[\w.-] )

Or

(?<=To:.*?)<(. ?)>
  • Related