I am trying to make a simple Discord bot to respond to some user input and having difficulty trying to parse the response for the info I need. I am trying to get their "gamertag"/username but the format is a little different sometimes.
So, my idea was to make a list of delimiter words I am looking for (different versions of the word gamertag such as Gamertag:, Gamertag -, username, etc.)
Then, look line by line for one that contains any of those delimiters.
Split the string on first matching delim, strip non alphanumeric characters
I had it kinda working for a single line, then realized some people don't put it on the first line so added line by line check and messed it up (on line 19 I just realized).. Also thought there must be a better way than this? please advise, some kinda working code at this link and copied below:
testString = """Application
Gamertag : testGamertag
Discord - testDiscord
Age - 25"""
applicationString = testString
gamertagSplitList = [ "gamertag", "Gamertag","Gamertag:", "gamertag:"]
#splWord = 'Gamertag'
lineNum = 0
for line in applicationString.partition('\n'):
print(line)
if line in gamertagSplitList:
applicationString = line
break
#get first line
#applicationString = applicationString.partition('\n')[0]
res = ""
#split on word, want to split on first occurrence of list of words
for splitWord in gamertagSplitList:
if splitWord in applicationString:
res = applicationString.split(splitWord)
break
splitString = res[1]
#res = test_string.split(spl_word, 1)
#splitString = res[1]
#get rid of non alphaNum characters
finalString = "" #define string for ouput
for character in splitString:
if(character.isalnum()):
# if character is alphanumeric concat to finalString
finalString = finalString character
print(finalString)
CodePudding user response:
Don't know if this will work with all your different inputs, but you can tweak it to get what you want :
import re
gamertagSplitList = ["gamertag", "Gamertag", "Gamertag:", "gamertag:"]
applicationString = """Application
Gamertag : testGamertag
Discord - testDiscord
Age - 25"""
for line in applicationString.split('\n'):
line = line.replace(' ', '')
for tag in gamertagSplitList:
if tag in line:
gamer_tag = line.replace(tag, '', 1)
break
print(re.sub(r'\W ', '', gamer_tag))
Output :
testGamertag
CodePudding user response:
You can do it without any loops with a single regex:
import re
gamertagSplitList = ["gamertag", "Gamertag"]
applicationString = """Application
Gamertag : testGamertag
Discord - testDiscord
Age - 25"""
print(re.search(r'(' '|'.join(gamertagSplitList) ')\s*[:-]?\s*(\w )\s*', applicationString)[2])
If all values in gamertagSplitList
differ just by casing, you can simplify that even further:
print(re.search(r'gamertag\s*[:-]?\s*(\w )\s*', applicationString, re.IGNORECASE)[1])
Let's take a closer look at this regex:
gamertag
will match a string 'gamertag'
\s*
will match any (including none) whitespace characters (space, newline, tab, etc.)
[:-]?
will match either none or a single character which is either :
or -
(\w )
will match 1 or more alphanumeric characters. Parenthesis here denote a group -- specific substring that we can extract later from the match.
By using re.IGNORECASE
we make matching case insensitive, so that separator GaMeRtAg
will also be recognised by this pattern.
The indexing part [1]
means that we're interested in a first group in our pattern (remember the parenthesis). A group with index 0 is always a full match, and groups from index 1 upwards represent substrings that match subexpressions in parenthesis (ordered by their (
appearance in the regex).