Home > Software engineering >  Make a list return each of its elements as individual strings to be placed in a regular expression
Make a list return each of its elements as individual strings to be placed in a regular expression

Time:03-09

I am facing a challenge in Python where I have a list that contains multiple strings. I want to use a Regex (findall) to search for any occurrence of each of the list's elements in a text file.

import re
name_list = ['friend', 'boy', 'man']
example_string = "friend"
file= open('file.txt', 'r') 
lines= file.read()

Then comes the re.findall expression. I configured it such that it finds any occurrence in the text file where a desired string is found between a number in parentheses (\d) and a period. It works perfectly when I place a string variable inside the regular expression, as seen below.

find = re.findall(r"([^(\d)]*?" example_string r"[^.]*)", lines)

However, I want to be able to replace example_string with some sort of mechanism that returns each of the elements in name_list as individual strings to be placed and searched for in the regular expression. The lists I work with can get much larger than the list Iin this example, so please keep that in mind.

As a beginner, I tried simply replacing the string in re.findall with the list I have, only to quickly realize that that would result in an error. The solution to this must allow me to use re.findall in the aforementioned manner, so most of the challenge lies in manipulating the list so that it can produce each of its elements as individual strings to be placed within re.findall.

Thank you for your insights.

CodePudding user response:

for name in name_list:
  find = re.findall(r"([^(\d)]*?" name r"[^.]*)", lines)
  # ... do stuff with the results

this iterates through each item in name_list, and runs the same regex as before.

CodePudding user response:

The pattern that you use ([^(\d)]*?[^.]*) for this match is not correct, see the match here.

I configured it such that it finds any occurrence in the text file where a desired string is found between a number in parentheses (\d) and a period.

It is due to this construct [^(\d)] that is a negated character class matching any character except for what is in between the square brackets.

The next negated character class [^.]* matches any char except a dot, but the final dot is not matched.


The pattern to find all between a number in parenthesis and a dot can be using a capture group that will be returned by re.findall.

\(\d \)([^.]*(?:friend|boy|man)[^.]*)\.

See a regex 101 demo

For example, if the content of file.txt is:

this is (10) with friend and a text.

Example code, assembling the words in a non capture group using .join(name_list)

import re

name_list = ['friend', 'boy', 'man']
pattern = rf"\(\d \)([^.]*(?:{'|'.join(name_list)})[^.]*)\."
file = open('file.txt', 'r')
lines = file.read()
print(re.findall(pattern, lines))

Output

[' with friend and a text']
  • Related