Home > Software design >  How to limit list of string is pattern with regex?
How to limit list of string is pattern with regex?

Time:11-06

I tried to compose patten with regex, and tried to validate multiple strings. However, seems my patterns fine according to regex documentation, but some reason, some invalid string is not validated correctly. Can anyone point me out what is my mistakes here?

test use case

this is test use case for one input string:

import re

usr_pat = r"^\$\w _src_username_\w $"
u_name='$ini_src_username_cdc_char4ec_pits'
m = re.match(usr_pat, u_name, re.M)
if m:
    print("Valid username:", m.group())
else:
    print("ERROR: Invalid user_name:\n", u_name)

I am expecting this return error because I am expecting input string must start with $ sign, then one string _\w , then _, then src, then _, then user_name, then _, then end with only one string \w . this is how I composed my pattern and tried to validate the different input strings, but some reason, it is not parsed correctly. Did I miss something here? can anyone point me out here?

desired output

this is valid and invalid input:

valid:

$ini_src_usrname_ajkc2e
$ini_src_password_ajkc2e
$ini_src_conn_url_ajkc2e

invalid:

$ini_src_usrname_ajkc2e_chan4
$ini_src_password_ajkc2e_tst1
$ini_smi_src_conn_url_ajkc2e_tst2
ini_smi_src_conn_url_ajkc2e_tst2
$ini_src_usrname_ajkc2e_chan4_jpn3

according to regex documentation, r"^\$\w _src_username_\w $" this should capture the logic that I want to parse, but it is not working all my test case. what did I miss here? thanks

CodePudding user response:

The \w character class also matches underscores and numbers:

Matches Unicode word characters; this includes most characters that can be part of a word in any language, as well as numbers and the underscore. If the ASCII flag is used, only [a-zA-Z0-9_] is matched.

(https://docs.python.org/3/library/re.html#regular-expression-syntax).

So the final \w matches the entirety of cdc_char4ec_pits

I think you are looking for [a-zA-Z0-9] which will not match underscores.

usr_pat = r"^\$[a-zA-Z0-9] _src_username_[a-zA-Z0-9] $"

CodePudding user response:

\w

First: \w means that capture:

1- one letter from a to z, or from A to Z

OR

2- one number from 0 to 9

OR

3- an underscore(_)

Second: The plus( ) sign after \w means that matches the previous token between one and unlimited times.

So if my regex pattern is: r"^\$\w $"

It would match the string: '$ini_src_username_cdc_char4ec_pits'

1- The ^\$ will match the dollar sign at the beginning of the string $

2- \w at first it will match the character i of the word ini and because of the sign it will continue to match the character n and the second i. After that the underscore exists after the word ini will be matched as well, this is because \w matches an underscore not just a number or a letter, the word src will be matched too, the underscore after the word src will be matched, the username word will be matched too and the whole string will be matched.

You mentioned the word "string", if you mean letters and numbers such as : "bla123", "123455" or "BLAbla", then you can use something like [a-zA-Z0-9] instead of \w .

  • Related