Pattern Matching in string-CodePudding

I have got a list of tokens in list tokens. I was thinking if I want to get tokens which are in pattern 'a-b' or a-b c or a-b-c. For example : 'python-object' Currently I am using this code :

for token in tokens:
   if "-" in token:
       print(token)

But this also results in token such as '-python' which I don't want

CodePudding user response：

Similiar to Cubix's answer, you can use [^-] rather than \w to make the expression robust against multiple dashes in the input:

The pattern in this case would be:

r"^[^-] -[^-] $"

And here's some sample code to show it works:

import re
good_token = 'a-b'
bad_token = 'a-b-c'
if re.match(r"^[^-] -[^-] $", good_token):
    print(good_token)
if re.match(r"^[^-] -[^-] $", bad_token):
    print(bad_token)

This outputs:

a-b

CodePudding user response：

Using regex:

import re

for token in tokens:
    if re.match("^\w -\w", token):
        print(token)

^: Start of the string.
\w: Matches any word character.
: Matches the preceding token (\w) between one and unlimited times.
-: Matches -.
\w: Matches any word character.

This regex checks that the token starts with a-b, what comes after doesn't matter.

CodePudding user response：

If I understand Correctly, This is what you want.

tokens = ["python-object", "a-b", "a-b c", "a-b-c"]
for token in tokens:
   if "-" in token:
       splited = token.split('-')
       joined = " ".join(splited)
       print(joined)

Output:

python object
a b
a b c
a b c