I have a concatenated text that I want to split using Regex. Luckily there is a pattern. The pattern is structured this way: (seconds) some text (seconds) some other text (seconds) some other text
(1-4) Agent: THANK YOU FOR CALLING XCOMPANY MY NAME IS DEVIN HOW CAN I HELP YOU (5-22) Customer: HI KEVIN I TRANSFERRED OVER TO YOU ... (24-29) Agent: OK AY (662-662) Customer: THANK YOU TOO (663-664) Agent: THANKS BYE NOW (664-664) Customer: BYE
I want to split each block so output should be like this.
(1-4) Agent: THANK YOU FOR CALLING XCOMPANY MY NAME IS DEVIN HOW CAN I HELP YOU
(5-22) Customer: HI KEVIN I TRANSFERRED OVER TO YOU ... ABOUT THAT BILL
(24-29) Agent: OK AY
So far I was able to create this \(\d*-\d*\)\s*\w*:\s*
, but this catches (1-4) Agent: I can't figure out the rest, I tried many things.
Here is Regex101 link, showing where I am stuck.
CodePudding user response:
you could try Match groups
(\(\d*-\d*\)\s*\w :[\s?\w] [^(]){1}
I'm not much of a regex pro but I do try. Let me know if it helped :D
CodePudding user response:
With
\(\d*-\d*\)\s*\w*:[^(]*
you can catch everything after the colon that is not an open parenthesis.
CodePudding user response:
In the pattern that you have tried, the digits between parenthesis are optional due to the *
, and the \w*:\s*
does not match beyond optional word characters :
and optional whitespace chars.
You can use:
\(\d -\d \).*?(?=\(\d -\d \)|$)
Explanation
\(\d -\d \)
match(
, 1 digits-
1 digits and)
.*?
Match any character, as few as possible(?=
Positive lookahead\(\d -\d \)
The digit pattern between parenthesis|
Or$
End of string (For the last occurrence)
)
Close lookahead
Example code
import re
pattern = r"\(\d -\d \).*?(?=\(\d -\d \)|$)"
s = "(1-4) Agent: THANK YOU FOR CALLING XCOMPANY MY NAME IS DEVIN HOW CAN I HELP YOU (5-22) Customer: HI KEVIN I TRANSFERRED OVER TO YOU ... (24-29) Agent: OK AY (662-662) Customer: THANK YOU TOO (663-664) Agent: THANKS BYE NOW (664-664) Customer: BYE"
print(re.findall(pattern, s))
Output
[
'(1-4) Agent: THANK YOU FOR CALLING XCOMPANY MY NAME IS DEVIN HOW CAN I HELP YOU ',
'(5-22) Customer: HI KEVIN I TRANSFERRED OVER TO YOU ... ',
'(24-29) Agent: OK AY ',
'(662-662) Customer: THANK YOU TOO ',
'(663-664) Agent: THANKS BYE NOW ', '(664-664) Customer: BYE'
]