I am parsing some user input to make a basic Discord bot assigning roles and such. I am trying to generalize some code to reuse for different similar tasks (doing similar things in different categories/channels).
Generally, I am looking for a substring (the category), then taking the string after as that categories value. I am looking line by line for my category, replacing the "category" substring and returning a stripped version. However, what I have now also replaces any space in the "value" string.
Originally the string looks like this:
Gamertag : 00test gamertag
What I want to do, is preserve the spaces in the value. The regex I am trying to do is: match all non alpha-numeric chars until the first letter.
My return is already matching non alpha but can't figure out how to get just first group, looks like it should be simply adding a ? to make it a lazy operator but not sure.. example code and string below (regex I want to replace is the final print string).
String I am working with:
- 00test Gamertag #(or any non-alpha delimiter)
Desired Results (by matching and stripping the extra characters)
00test Gamertag #(remove leading space and any non-alpha characters before the first words)
The regex I am trying to do is: match all non alpha-numeric chars until the first letter. Should be something like the following, which is close to what I use to strip non-alphas now but it does all not the first group - so I want to match the first group of non-alphas in a string to strip that part using re.sub..
\W ?
https://www.online-python.com/gDVhZrnmlq
Thank you!
CodePudding user response:
Your regex will substitute the non-alphanumerical characters anywhere in the input string. If you only need to have this happening at the start of the string, then use the start-of-input anchor (i.e. ^
):
^\W
CodePudding user response:
It depends on your inputs, you can use two regex to achieve your goal, the first to remove all non alpha-numeric from your string including the ones between words, and the second one to remove whitespaces between words if there is more than one space between each two words :
import re
gamer_tag = "µ& - 00test - Gamertag"
gamer_tag = re.sub(r"[^a-zA-Z0-9\s]", "", gamer_tag)
gamer_tag = re.sub(r" ", " ", gamer_tag)
print(gamer_tag.strip())
# Output: 00test Gamertag
You can remove the second re.sub()
if you sure that there will no more than one space between words.
gamer_tag = "- 00test Gamertag "
gamer_tag = re.sub(r"[^a-zA-Z0-9\s]", "", gamer_tag)
print(gamer_tag.strip())
# Output: 00test Gamertag