Requirement: I am receiving an email with a template and I need to filter out some text from the email. I am converting all the email body text as a string.
email text looks like this:
some body text which I don't need
Discussion:
Tue 26/04/2022/2:48 PM UTC 10/ ABC User-
TEST description - this should be logged as a comment. --- This is the part I need
ABC Company Australia | XYZ St | Sydney NSW 2000
I was thinking of having a regex that looks for
- A word "Discussion"
- Next line look for DateTime format with "Tue 26/04/2022/2:48 PM UTC 10/ ABC User-"
- Pick up the next line until we find this line - "ABC Company Australia | XYZ St | Sydney NSW 2000" address
Is it possible? can someone plz help with regex?
TIA.
CodePudding user response:
You may try this regex:
Discussion.*?\n ([A-Za-z] (?:\d{2}\/){2}\d{4}\/\d :\d [^\n] )(.*)?ABC Company Australia \| XYZ St \| Sydney NSW 2000
Explanation:
Discussion.*?\n
The regex starts from where the StringDiscussion
begins..*?\n
keeps looking for additional word and newlines([A-Za-z] (?:\d{2}\/){2}\d{4}\/\d :\d [^\n] )
next it looks for the date format as you described . It will fetch all until it reaches a newline[^\n ]
(.*)?
It will fetch everything from the previous date lineABC Company Australia \| XYZ St \| Sydney NSW 2000
and will conclude matching whenever it finds this.- Here I have kept the Date format line in group 1 and the body you need in group 2
source:
const regex = /Discussion.*?\n ([A-Za-z] (?:\d{2}\/){2}\d{4}\/\d :\d [^\n] )(.*)?ABC Company Australia \| XYZ St \| Sydney NSW 2000/gms;
const str = `some body text which I don't need
Discussion:
Tue 26/04/2022/2:48 PM UTC 10/ ABC User-
TEST description - this should be logged as a comment. --- This is the part I need
ABC Company Australia | XYZ St | Sydney NSW 2000
`;
var match = regex.exec(str);
if(match!=null){
console.log(match[1]);
console.log(match[2]);
}
CodePudding user response:
If it was just about the content the OP is interested in the following regex already is sufficient enough ... /Discussion:\n[a-zA-Z]{1,3}\s \d{2}\/\d{2}\/\d{4}.*\n (?<content>.*)/
const multilineMail = `Discussion:
Tue 26/04/2022/2:48 PM UTC 10/ ABC User-
TEST description - this should be logged as a comment. --- This is the part I need
ABC Company Australia | XYZ St | Sydney NSW 2000`;
// see ... [https://regex101.com/r/v8FXCA/3]
const regXMailContent =
/Discussion:\n[a-zA-Z]{1,3}\s \d{2}\/\d{2}\/\d{4}.*\n (?<content>.*)/;
console.log(
regXMailContent.exec(multilineMail)?.groups?.content
);
In case the company footer has to match exactly one has to make it part of the above regex like follows ... /Discussion:\n[a-zA-Z]{1,3}\s \d{2}\/\d{2}\/\d{4}.*\n (?<content>.*)\n ABC Company Australia \| XYZ St \| Sydney NSW 2000/
const multilineMail = `Discussion:
Tue 26/04/2022/2:48 PM UTC 10/ ABC User-
TEST description - this should be logged as a comment. --- This is the part I need
ABC Company Australia | XYZ St | Sydney NSW 2000`;
// see ... [https://regex101.com/r/v8FXCA/4]
const regXMailContent =
/Discussion:\n[a-zA-Z]{1,3}\s \d{2}\/\d{2}\/\d{4}.*\n (?<content>.*)\n ABC Company Australia \| XYZ St \| Sydney NSW 2000/;
console.log(
regXMailContent.exec(multilineMail)?.groups?.content
);
If the OP wants to also save date and user one would enhance the firstly provided regex like with ...
/Discussion:\n(?<date>[a-zA-Z]{1,3}\s \d{2}\/\d{2}\/\d{4}).*\n (?<content>.*)/
/Discussion:\n(?<date>[a-zA-Z]{1,3}\s \d{2}\/\d{2}\/\d{4}\/[^/] )\/\s*(?<user>.*?)-?\s*\n (?<content>.*)/
const multilineMail = `Discussion:
Tue 26/04/2022/2:48 PM UTC 10/ ABC User-
TEST description - this should be logged as a comment. --- This is the part I need
ABC Company Australia | XYZ St | Sydney NSW 2000`;
// see ... [https://regex101.com/r/v8FXCA/2]
const regXMailDateAndContent =
/Discussion:\n(?<date>[a-zA-Z]{1,3}\s \d{2}\/\d{2}\/\d{4}).*\n (?<content>.*)/;
// see ... [https://regex101.com/r/v8FXCA/1]
const regXMailDateUserAndContent =
/Discussion:\n(?<date>[a-zA-Z]{1,3}\s \d{2}\/\d{2}\/\d{4}\/[^/] )\/\s*(?<user>.*?)-?\s*\n (?<content>.*)/;
console.log(
regXMailDateAndContent.exec(multilineMail)?.groups
);
console.log(
regXMailDateUserAndContent.exec(multilineMail)?.groups
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
But in case the to be extracted content is a multiline text the regex has to feature the company footer in order to identify the correct match. The 2ndly provided regex then changes to ... /Discussion:\n[a-zA-Z]{1,3}\s \d{2}\/\d{2}\/\d{4}.*\n (?<content>(?:.*\n)*)ABC Company Australia \| XYZ St \| Sydney NSW 2000/
const multilineMail = `Discussion:
Tue 26/04/2022/2:48 PM UTC 10/ ABC User-
TEST
description - this should be
logged as a comment. --- This is
the part I need
ABC Company Australia | XYZ St | Sydney NSW 2000`;
// see ... [https://regex101.com/r/v8FXCA/5]
const regXMailMultilineContent =
/Discussion:\n[a-zA-Z]{1,3}\s \d{2}\/\d{2}\/\d{4}.*\n (?<content>(?:.*\n)*)ABC Company Australia \| XYZ St \| Sydney NSW 2000/;
console.log(
regXMailMultilineContent.exec(multilineMail)?.groups?.content
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
All of the above regex patterns make use of named capturing groups
.