I have a situation where I have a list of words and I want to use regex to match all words without punctation within the word, at the beginning of the word, or at the end of the word. I am at a lost for the regex needed, but I would like to use re.findall(). Below is an example.
Example sentence: 'I don't like to play baseball. I would rather take a nap instead.'
What I need it to match: like to play I would rather take a nap
Since there is an apostrophe at the beginning of the phrase the 'I' it is not matched. Since there is a period at the end of 'baseball' and 'instead', it is also not matched.
CodePudding user response:
The following regex should work: [a-zA-Z] ([a-zA-Z] )*
It checks for a space, followed by one or more letters, followed by an additional space. It then checks for more words using an overlapping space to prevent one match taking another match's space.
I would suggest using Regex101 to test and visualize regexes. It also has a bunch of useful information about what characters do what in regex, a debugger, and a panel breaking down each part of your inputted regex.
CodePudding user response:
import re
txt = "'I don't like to play baseball. I would rather take a nap instead.'"
new_txt = ""
reg = r"^[a-zA-Z] $"
for i,val in enumerate(txt.split()):
if re.match(reg, val):
new_txt = val;
if i != len(txt.split()) - 1:
new_txt = " "
print(new_txt)
Output
like to play I would rather take a nap