I have got a list of tokens in list tokens
. I was thinking if I want to get tokens which are in pattern 'a-b'
or a-b c
or a-b-c
.
For example : 'python-object'
Currently I am using this code :
for token in tokens:
if "-" in token:
print(token)
But this also results in token such as '-python' which I don't want
CodePudding user response:
Similiar to Cubix's answer, you can use [^-]
rather than \w
to make the expression robust against multiple dashes in the input:
The pattern in this case would be:
r"^[^-] -[^-] $"
And here's some sample code to show it works:
import re
good_token = 'a-b'
bad_token = 'a-b-c'
if re.match(r"^[^-] -[^-] $", good_token):
print(good_token)
if re.match(r"^[^-] -[^-] $", bad_token):
print(bad_token)
This outputs:
a-b
CodePudding user response:
Using regex:
import re
for token in tokens:
if re.match("^\w -\w", token):
print(token)
^
: Start of the string.\w
: Matches any word character.\w
) between one and unlimited times.-
: Matches-
.\w
: Matches any word character.
This regex checks that the token starts with a-b
, what comes after doesn't matter.
CodePudding user response:
If I understand Correctly, This is what you want.
tokens = ["python-object", "a-b", "a-b c", "a-b-c"]
for token in tokens:
if "-" in token:
splited = token.split('-')
joined = " ".join(splited)
print(joined)
Output:
python object
a b
a b c
a b c