Split string with multiple possible delimiters to get substring-CodePudding

I am trying to make a simple Discord bot to respond to some user input and having difficulty trying to parse the response for the info I need. I am trying to get their "gamertag"/username but the format is a little different sometimes.

So, my idea was to make a list of delimiter words I am looking for (different versions of the word gamertag such as Gamertag:, Gamertag -, username, etc.)

Then, look line by line for one that contains any of those delimiters.

Split the string on first matching delim, strip non alphanumeric characters

I had it kinda working for a single line, then realized some people don't put it on the first line so added line by line check and messed it up (on line 19 I just realized).. Also thought there must be a better way than this? please advise, some kinda working code at this link and copied below:

testString = """Application

Gamertag : testGamertag

Discord - testDiscord

Age - 25"""

 
applicationString = testString
 
gamertagSplitList = [ "gamertag", "Gamertag","Gamertag:", "gamertag:"]
#splWord = 'Gamertag'
lineNum = 0

    
for line in applicationString.partition('\n'):
    print(line)
    if line in gamertagSplitList:
        applicationString = line 
        break
    
#get first line

#applicationString = applicationString.partition('\n')[0]

 
res = ""
#split on word, want to split on first occurrence of list of words
for splitWord in gamertagSplitList:
    if splitWord in applicationString:
        res = applicationString.split(splitWord)
        break
    
splitString = res[1] 
#res = test_string.split(spl_word, 1)
#splitString = res[1]

#get rid of non alphaNum characters
finalString = "" #define string for ouput

for character in splitString:
    if(character.isalnum()):
        # if character is alphanumeric concat to finalString
        finalString = finalString   character

print(finalString)

CodePudding user response：

Don't know if this will work with all your different inputs, but you can tweak it to get what you want :

import re


gamertagSplitList = ["gamertag", "Gamertag", "Gamertag:", "gamertag:"]

applicationString = """Application

Gamertag : testGamertag

Discord - testDiscord

Age - 25"""


for line in applicationString.split('\n'):
    line = line.replace(' ', '')
    for tag in gamertagSplitList:
        if tag in line:
            gamer_tag = line.replace(tag, '', 1)
            break

print(re.sub(r'\W ', '', gamer_tag))

Output :

testGamertag

CodePudding user response：

You can do it without any loops with a single regex:

import re

gamertagSplitList = ["gamertag", "Gamertag"]
applicationString = """Application

Gamertag : testGamertag

Discord - testDiscord

Age - 25"""

print(re.search(r'('   '|'.join(gamertagSplitList)   ')\s*[:-]?\s*(\w )\s*', applicationString)[2])

If all values in gamertagSplitList differ just by casing, you can simplify that even further:

print(re.search(r'gamertag\s*[:-]?\s*(\w )\s*', applicationString, re.IGNORECASE)[1])

Let's take a closer look at this regex: gamertag will match a string 'gamertag' \s* will match any (including none) whitespace characters (space, newline, tab, etc.) [:-]? will match either none or a single character which is either : or - (\w ) will match 1 or more alphanumeric characters. Parenthesis here denote a group -- specific substring that we can extract later from the match.

By using re.IGNORECASE we make matching case insensitive, so that separator GaMeRtAg will also be recognised by this pattern.

The indexing part [1] means that we're interested in a first group in our pattern (remember the parenthesis). A group with index 0 is always a full match, and groups from index 1 upwards represent substrings that match subexpressions in parenthesis (ordered by their ( appearance in the regex).