Regex: Repeat entire group 0 or more times (one or more words separated by 's)-CodePudding

I am trying to match words separated with the character as input from a user in python and check if each of the words in a predetermined list. I am having trouble creating a regular expression to match these words (words are comprised of more than one A-z characters). For example, an input string foo should match as well as foo bar and foo bar baz with each of the words (not 's) being captured.

So far, I have tried a few regular expressions but the closest I have got is this:

/^([A-z ] )\ ([A-z ] )$/

However, this only matches the case in which there are two words separated with a , I need there to be one or more words. My method above would have worked if I could somehow repeat the second group (\ ([A-z ] )) zero or more times. So hence my question is: How can I repeat a capturing group zero or more times?
If there is a better way to do what I am doing, please let me know.

CodePudding user response：

You could write the pattern as:

(?i)[A-Z] (?:\ [A-Z] )*$

Explanation

(?i) Inline modifier for case insensitive
[A-Z] Match 1 chars A-Z
(?:\ [A-Z] )* Optionally repeat matching and again 1 chars A-Z
$ End of string

See a regex101 demo for the matches:

For example

import re

predeterminedList = ["foo", "bar"]

strings = [
    "foo",
    "foo bar",
    "foo bar baz",
    "test abc"
]

pattern = r"(?i)[A-Z] (?:\ [A-Z] )*$"

for s in strings:
    m = re.match(pattern, s)
    if m:
        words = m.group().split(" ")
        intersect = bool(set(words) & set(predeterminedList))
        fmt = ','.join(predeterminedList)
        if intersect:
            print(f"'{s}' contains at least one of '{fmt}'")
        else:
            print(f"'{s}' contains at none of '{fmt}'")

Output

'foo' contains at least one of 'foo,bar'
'foo bar' contains at least one of 'foo,bar'
'foo bar baz' contains at least one of 'foo,bar'
'test abc' contains at none of 'foo,bar'

CodePudding user response：

NOTE: A-z in your [A-z ] does not only mean that any capital letter from A to Z or any small letter from a to z, it also means that other characters in that range like []\`^_ will also be included. See ASCII table. I think you mean this [A-Za-z ] .

Try this regex pattern:

^(?![\s\S]*\ $)(?:[A-Za-z] \ ?) $

^ start of the string.
(?![\s\S]*\ $) ensures the end of the string is not a literal .
(?:[A-Za-z] \ ?) non-capturing group: [A-Za-z] \ ? one or more letter followed by an optional literal , this group will be repeated at least once.
$ end of the string.

See regex demo

import re
txt = 'foo bar baz'
arr = re.findall(r'^(?![\s\S]*\ $)(?:[A-Za-z] \ ?) $', txt)

if arr:
    arr=arr[0].split(' ')

print(arr)

#Output ['foo', 'bar', 'baz']