how to extract the front and back of a designated special token using regex?-CodePudding

How to extract the front and back of a designated special token(in this case, -, not @)? And if those that are connected by - are more than two, I want to extract those too. (In the example, Bill-Gates-Foundation)

e.g) from 'Meinda@Bill-Gates-Foundation@drug-delivery' -> ['Bill-Gates-Foundation', 'drug-delivery']

I tried p = re.compile('@(\D )\*(\D )')

but that was not what I wanted.

CodePudding user response：

You can exclude matchting the @ char and repeat 1 or more times the -

@([^\s@-] (?:-[^\s@-] ) )

Explanation

@ Match literally
( Capture group 1 (returned by re.findall)
- [^\s@-] Match 1 non whitespace chars except - and @
- (?:-[^\s@-] ) Repeat 1 times matching - and again 1 non whitespace chars except - and @
) Close group 1

Regex demo

import re

pattern = r"@([^\s@-] (?:-[^\s@-] ) )"
s = r"Meinda@Bill-Gates-Foundation@drug-delivery"
print(re.findall(pattern, s))

Output

['Bill-Gates-Foundation', 'drug-delivery']

CodePudding user response：

To extract the front and back of a designated special token (in this case, -, not @), you can use a regular expression with the re module.

Here is an example of how you can use a regular expression to extract the front and back of the - token in a given string:

import re

# The input string
string = 'Meinda@Bill-Gates-Foundation@drug-delivery'

# Use a regular expression to extract the front and back of the '-' token
p = re.compile(r'@([\w-] )@([\w-] )')
matches = p.findall(string=string)

# Print the matches
print(matches)

This code will print the following output:

[('Bill-Gates-Foundation', 'drug-delivery')]

CodePudding user response：

@ahmet-buğra-buĞa gave an answer with regex.

If you don't have to use regex, then it is easier way is to just use split.

test_str = "Meinda@Bill-Gates-Foundation@drug-delivery"
test_str.split("@")[1:]

This outputs

['Bill-Gates-Foundation', 'drug-delivery']

You can make it a function like so

def get_list_of_strings_after_first(original_str, token_to_split_on):
    return original_str.split("@")[1:]
get_list_of_strings_after_first("Meinda@Bill-Gates-Foundation@drug-delivery", "@")

This give the same output

['Bill-Gates-Foundation', 'drug-delivery']