Home > Enterprise >  Python Regular Expression: re.sub to replace matches
Python Regular Expression: re.sub to replace matches

Time:11-30

I am trying to analyze an earnings call using python regular expression. I want to delete unnecessary lines which only contain the name and position of the person, who is speaking next.

This is an excerpt of the text I want to analyze:

"Questions and Answers\nOperator [1]\n\n Shannon Siemsen Cross, Cross Research LLC - Co-Founder, Principal & Analyst [2]\n I hope everyone is well. Tim, you talked about seeing some improvement in the second half of April. So I was wondering if you could just talk maybe a bit more on the segment and geographic basis what you're seeing in the various regions that you're selling in and what you're hearing from your customers. And then I have a follow-up.\n Timothy D. Cook, Apple Inc. - CEO & Director [3]\n ..."

At the end of each line that I want to delete, you have [some number].

So I used the following line of code to get these lines:

name_lines = re.findall('.*[\d]]', text)

This works and gives me the following list: ['Operator [1]', ' Shannon Siemsen Cross, Cross Research LLC - Co-Founder, Principal & Analyst [2]', ' Timothy D. Cook, Apple Inc. - CEO & Director [3]']

So, now in the next step I want to replace this strings in the text using the following line of code:

for i in range(0,len(name_lines)): 
    text = re.sub(name_lines[i], '', text)

But this does not work. Also if I just try to replace 1 instead of using the loop it does not work, but I have no clue why.

Also if I try now to use re.findall and search for the lines I obtained from the first line of code I don`t get a match.

CodePudding user response:

Try to use re.sub to replace the match:

import re

text = """\
Questions and Answers
Operator [1]

Shannon Siemsen Cross, Cross Research LLC - Co-Founder, Principal & Analyst [2]
I hope everyone is well. Tim, you talked about seeing some improvement in the second half of April. So I was wondering if you could just talk maybe a bit more on the segment and geographic basis what you're seeing in the various regions that you're selling in and what you're hearing from your customers. And then I have a follow-up.
Timothy D. Cook, Apple Inc. - CEO & Director [3]"""

text = re.sub(r".*\d]", "", text)
print(text)

Prints:

Questions and Answers



I hope everyone is well. Tim, you talked about seeing some improvement in the second half of April. So I was wondering if you could just talk maybe a bit more on the segment and geographic basis what you're seeing in the various regions that you're selling in and what you're hearing from your customers. And then I have a follow-up.

CodePudding user response:

The first argument to re.sub is treated as a regular expression, so the square brackets get a special meaning and don't match literally.

You don't need a regular expression for this replacement at all though (and you also don't need the loop counter i):

for name_line in name_lines:
    text = text.replace(name_line, '')
  • Related