I have this REGEX to check a paragraph and get some data from there.
([0-9]{1,2}:{0,1}[0-9]{0,2}[a-z]{0,2})[\s\D\s] ([0-9]{1,2}:{0,1}[0-9]{0,2}[a-z]{0,2}),(. ),(\s\w{1,2} de [\wç] de \d{4})?(\s\w \d{1,2}, \d{4})?$
I need to get the hour, title and the date of this type of texts:
EXAMPLE 1 : This example the number 130 is causing the issue and I can't get the first hour
1:30pm to 4:30pm, Aniversário amigo matteo, Ana Montoya, Accepted, Location: Kids Buffet Infantil
Rua do Triunfo, 130, Brookling, Hello - SP, 04602-005, Brasil, November 23, 2022
EXAMPLE 2 : This is working correctly
8am to 9:30am, All Hearts meeting, Ana Montoya, Accepted, Location: https://us02web.zoom.us/j/1234?pwd=1234, November 21, 2022
Get the two hours, the text of the title and the final date
CodePudding user response:
Here is a modified regex with your sample input strings:
[
'1:30pm to 4:30pm, Aniversário amigo matteo, Ana Montoya, Accepted, Location: Kids Buffet Infantil Rua do Triunfo, 130, Brookling, Hello - SP, 04602-005, Brasil, November 23, 2022',
'8am to 9:30am, All Hearts meeting, Ana Montoya, Accepted, Location: https://us02web.zoom.us/j/1234?pwd=1234, November 21, 2022'
].forEach(str => {
let m = str.match(/^(\d\d?(?::\d\d)?[ap]m) to (\d\d?(?::\d\d)?[ap]m), *([^,] ).* ([a-z] \d , \d{4})/i);
console.log(m);
});
Output:
[
"1:30pm to 4:30pm, Aniversário amigo matteo, Ana Montoya, Accepted, Location: Kids Buffet Infantil Rua do Triunfo, 130, Brookling, Hello - SP, 04602-005, Brasil, November 23, 2022",
"1:30pm",
"4:30pm",
"Aniversário amigo matteo",
"November 23, 2022"
]
[
"8am to 9:30am, All Hearts meeting, Ana Montoya, Accepted, Location: https://us02web.zoom.us/j/1234?pwd=1234, November 21, 2022",
"8am",
"9:30am",
"All Hearts meeting",
"November 21, 2022"
]
Explanation of regex:
^
-- anchor at start of string(
-- capture group 1 start\d\d?
-- 1 or 2 digits(?::\d\d)?
-- optional non-capture group for colon and 2 digits[ap]m
-- literalam
orpm
)
-- capture group 1 endto
-- literal text(\d\d?(?::\d\d)?[ap]m)
-- capture group 2, same as above, *
-- comma and optional spaces([^,] )
-- title up to next comma.*
-- greedy scan to last space, followed by:([a-z] \d , \d{4})
-- date formatMmmmm dd, yyyy
- ignore case flag
i
CodePudding user response:
([0-9]{1,2}:{0,1}[0-9]{0,2}[a-z]{0,2})[\s\D\s] ([0-9]{1,2}:{0,1}[0-9]{0,2}[a-z]{0,2}),(. ),(\s\w{1,2} de [\wç] de \d{4})?(\s\w \d{1,2}, \d{4})?.*$