Home > other >  How to catch text till paranthesis in Python Regex?
How to catch text till paranthesis in Python Regex?

Time:07-01

I have a concatenated text that I want to split using Regex. Luckily there is a pattern. The pattern is structured this way: (seconds) some text (seconds) some other text (seconds) some other text

(1-4) Agent: THANK YOU FOR CALLING XCOMPANY MY NAME IS DEVIN HOW CAN I HELP YOU (5-22) Customer: HI KEVIN I TRANSFERRED OVER TO YOU ... (24-29) Agent: OK AY (662-662) Customer: THANK YOU TOO (663-664) Agent: THANKS BYE NOW (664-664) Customer: BYE

I want to split each block so output should be like this.

(1-4) Agent: THANK YOU FOR CALLING XCOMPANY MY NAME IS DEVIN  HOW CAN I HELP YOU 
(5-22) Customer: HI KEVIN I TRANSFERRED OVER TO YOU ... ABOUT THAT BILL 
(24-29) Agent: OK  AY 

So far I was able to create this \(\d*-\d*\)\s*\w*:\s*, but this catches (1-4) Agent: I can't figure out the rest, I tried many things.

Here is Regex101 link, showing where I am stuck.

CodePudding user response:

you could try Match groups

(\(\d*-\d*\)\s*\w :[\s?\w] [^(]){1}

I'm not much of a regex pro but I do try. Let me know if it helped :D

CodePudding user response:

With

\(\d*-\d*\)\s*\w*:[^(]*

you can catch everything after the colon that is not an open parenthesis.

CodePudding user response:

In the pattern that you have tried, the digits between parenthesis are optional due to the *, and the \w*:\s* does not match beyond optional word characters : and optional whitespace chars.


You can use:

\(\d -\d \).*?(?=\(\d -\d \)|$)

Explanation

  • \(\d -\d \) match (, 1 digits - 1 digits and )
  • .*? Match any character, as few as possible
  • (?= Positive lookahead
    • \(\d -\d \) The digit pattern between parenthesis
    • | Or
    • $ End of string (For the last occurrence)
  • ) Close lookahead

Regex demo

Example code

import re

pattern = r"\(\d -\d \).*?(?=\(\d -\d \)|$)"

s = "(1-4) Agent: THANK YOU FOR CALLING XCOMPANY MY NAME IS DEVIN HOW CAN I HELP YOU (5-22) Customer: HI KEVIN I TRANSFERRED OVER TO YOU ... (24-29) Agent: OK AY (662-662) Customer: THANK YOU TOO (663-664) Agent: THANKS BYE NOW (664-664) Customer: BYE"

print(re.findall(pattern, s))

Output

[
'(1-4) Agent: THANK YOU FOR CALLING XCOMPANY MY NAME IS DEVIN HOW CAN I HELP YOU ',
'(5-22) Customer: HI KEVIN I TRANSFERRED OVER TO YOU ... ',
'(24-29) Agent: OK AY ',
'(662-662) Customer: THANK YOU TOO ',
'(663-664) Agent: THANKS BYE NOW ', '(664-664) Customer: BYE'
]
  • Related