Home > Net >  Regex to select words without any punctation in python
Regex to select words without any punctation in python

Time:07-28

I have a situation where I have a list of words and I want to use regex to match all words without punctation within the word, at the beginning of the word, or at the end of the word. I am at a lost for the regex needed, but I would like to use re.findall(). Below is an example.

Example sentence: 'I don't like to play baseball. I would rather take a nap instead.'

What I need it to match: like to play I would rather take a nap

Since there is an apostrophe at the beginning of the phrase the 'I' it is not matched. Since there is a period at the end of 'baseball' and 'instead', it is also not matched.

CodePudding user response:

The following regex should work: [a-zA-Z] ([a-zA-Z] )*

It checks for a space, followed by one or more letters, followed by an additional space. It then checks for more words using an overlapping space to prevent one match taking another match's space.

I would suggest using Regex101 to test and visualize regexes. It also has a bunch of useful information about what characters do what in regex, a debugger, and a panel breaking down each part of your inputted regex.

CodePudding user response:

import re

txt = "'I don't like to play baseball. I would rather take a nap instead.'"
new_txt = ""
reg = r"^[a-zA-Z] $"

for i,val in enumerate(txt.split()):
    if re.match(reg, val):
        new_txt  = val;

        if i != len(txt.split()) - 1:
            new_txt  = " "
print(new_txt)

Output

like to play I would rather take a nap
  • Related