Home > Back-end >  How to use regular expression in loop to extract only names in a sentence?
How to use regular expression in loop to extract only names in a sentence?

Time:10-07

I'm trying to exctract names enclosed in square brackets and which appear only after a substring. In the example sentence shown below, the substring is "[A]."

"This is [A].[Alpha] and this is [A].[Beta] and this is [A].[Charlie] and so on"

I'm trying to generate a list as shown below:

enter image description here

CodePudding user response:

\[A\].\[([^\]]*)]

https://regex101.com/r/NF526r/1

That should do the trick for you. I'm taking advantage of negated character classes.

Here is a demo in python:

import re

mystring = "This is [A].[Alpha] and this is [A].[Beta] and this is [A].[Charlie] and so on"

values = re.findall("\[A\].\[([^\]]*)]", mystring)

print(values)

results:

['Alpha', 'Beta', 'Charlie']

CodePudding user response:

Try this:

df['col'] = df['col'].str.findall(r"\[A\].\[([^\]]*)]")
df.explode('col')

        col
0    Alpha
0     Beta
0  Charlie

Where 'col' is the column with your text.

  • Related