Home > database >  Python Regular Expression - Get Text starting in the next line after the match was found
Python Regular Expression - Get Text starting in the next line after the match was found

Time:11-25

I have a question on using regular expressions in Python. This is a part of the text I am analysing.

Amit Jawaharlaz Daryanani,  Evercore ISI Institutional Equities, Research Division - Senior MD & Fundamental Research Analyst   [19]\n I have 2 as well. I guess, first off, on the channel inventory, I was hoping if you could talk about how did channel inventory look like in the March quarter because it sounds like it may be below the historical ranges. And then the discussion you had for June quarter performance of iPhones, what are you embedding from a channel building back inventory levels in that expectation?\n 

My Goal is to extract this part of the text by matching the name of the analyst which is Amit Jawaharlaz Daryanani: \n I have 2 as well. I guess, first off, on the channel inventory, I was hoping if you could talk about how did channel inventory look like in the March quarter because it sounds like it may be below the historical ranges. And then the discussion you had for June quarter performance of iPhones, what are you embedding from a channel building back inventory levels in that expectation?\n

I cannot just do from \n to \n because the text is much longer and I specifically need the line of text which comes after his name.

I tried: re.findall(r'(?<=Amit Jawaharlaz Daryanani).*?(?=\n)', text)

But the Output here is

[',  Evercore ISI Institutional Equities, Research Division - Senior MD & Fundamental Research Analyst   [19]'

So how can I start after the first \n that comes after his name until the second \n after his name?

CodePudding user response:

You can use a capture group:

\bAmit Jawaharlaz Daryanani\b.*\n\s*(.*)\n

Explanation

  • \bAmit Jawaharlaz Daryanani\b Match the name
  • .*\n Match the rest of the line and a newline
  • \s*(.*)\n Match optional whitespace chars, and capture a whole line in group 1 followed by matching a newline

See a regex demo and a Python demo.

import re

pattern = r"\bAmit Jawaharlaz Daryanani\b.*\n\s*(.*)\n"

s = ("Amit Jawaharlaz Daryanani,  Evercore ISI Institutional Equities, Research Division - Senior MD & Fundamental Research Analyst   [19]\n"
     " I have 2 as well. I guess, first off, on the channel inventory, I was hoping if you could talk about how did channel inventory look like in the March quarter because it sounds like it may be below the historical ranges. And then the discussion you had for June quarter performance of iPhones, what are you embedding from a channel building back inventory levels in that expectation?\n"
     " \n")

m = re.search(pattern, s)
if m:
    print(m.group(1))

Output

I have 2 as well. I guess, first off, on the channel inventory, I was hoping if you could talk about how did channel inventory look like in the March quarter because it sounds like it may be below the historical ranges. And then the discussion you had for June quarter performance of iPhones, what are you embedding from a channel building back inventory levels in that expectation?

CodePudding user response:

Try this:

  • non-capturing group for the name
  • look for the first \n
  • capturing group until the second \n
re.findall(r'(?:Amit Jawaharlaz Daryanani).*?\n(.*?)\n', text)

This works because of .*?, which is non-greedy. This means it stops before the first \n that is encountered.

Output:

[' I have 2 as well. I guess, first off, on the channel inventory, I was hoping if you could talk about how did channel inventory look like in the March quarter because it sounds like it may be below the historical ranges. And then the discussion you had for June quarter performance of iPhones, what are you embedding from a channel building back inventory levels in that expectation?']
  • Related