How to create regex to match a string that contains only hexadecimal numbers and arrows?-CodePudding

I am using a string that uses the following characters:

0-9    
a-f    
A-F    
Greater Than, Hyphen

The mixture of the greater than and hyphen must be:

->
-->

Here is the regex that I have so far:

[0-9a-fA-F\-\>]

I saw on multiple websites that to have an exclusion(g-zG-Z, other special characters) you would use ^ but I tried to do the following and it did not work:

[^g-zG-Z][0-9a-fA-F\-\>] 
^g-zG-Z[0-9a-fA-F\-\>] 
[0-9a-fA-F\-\>]^g-zG-Z 
[0-9a-fA-F\-\>] ^g-zG-Z
[0-9a-fA-F\-\>] [^g-zG-Z]

Just to be extra clear since this is a larger question I want my regex to contain:

a-z
A-Z
0-9
Hyphen and Greater than in the order of "->" or "-->"

I test all of the regex using:

import re as regex
regex.match(regex.compile("REGEX"),"String")

Here are some samples:

"0912adbd->12d1829-->218990d"
"ab2c8d-->82a921->193acd7"

CodePudding user response：

Firstly, you don't need to escape - and >

Here's the regex that worked for me:
^([0-9a-fA-F]*(->)*(-->)*)*$
Here's an alternative regex:
^([0-9a-fA-F]*(- >)*)*$

What does the regex do?

^ matches the beginning of the string and $ matches the ending.
* matches 0 or more instances of the preceding token
Created a big () capturing group to match any token.
[0-9a-fA-F] matches any character that is in the range.
(->) and (-->) match only those given instances.

Putting it into a code:

import re
regex = "^([0-9a-fA-F]*(\-\>)*(\-\-\>)*)*$"
re.match(re.compile(regex),"0912adbd->12d1829-->218990d")
re.match(re.compile(regex),"ab2c8d-->82a921->193acd7")
re.match(re.compile(regex),"this-failed->so-->bad")

You can also convert it into a boolean:

print(bool(re.match(re.compile(regex),"0912adbd->12d1829-->218990d")))
print(bool(re.match(re.compile(regex),"ab2c8d-->82a921->193acd7")))
print(bool(re.match(re.compile(regex),"this-failed->so-->bad")))

Output:

True
True
False

I recommend using regexr.com to check your regex.

CodePudding user response：

If there must be an arrow present, and not at the start or end of the string using a case insensitive pattern:

^[a-f\d] (?:-{1,2}>[a-f\d] ) $

Explanation

^ Start of string
[a-f\d] Match 1 chars a-f or digits
(?: Non capture group to repeat as a whole
- -{1,2}>[a-f\d] Match - or -- and > followed by 1 chars a-f or digits
) Close the non capture group and repeat 1 times
$ End of string

See a regex demo and a Python demo.

import re

pattern = r"^[a-f\d] (?:-{1,2}>[a-f\d] ) $"
s = ("0912adbd->12d1829-->218990d\n"
            "ab2c8d-->82a921->193acd7\n"
            "test")
print(re.findall(pattern, s, re.I | re.M))

Output

[
  '0912adbd->12d1829-->218990d',
  'ab2c8d-->82a921->193acd7'
]

CodePudding user response：

It's probably more intuitive to re.split() on the "arrows" and then check that all the resulting strings are purely hexadecimal:

import re

def check_hex_arrows(inp):
    parts = re.split(r"-{1,2}>", inp)
    return all(re.match(r"^[0-9a-fA-F]*$", p) for p in parts)

Testing:

test_strings = ["0912adbd->12d1829-->218990d",
                "ab2c8d-->82a921->193acd7",
                "0912adbd",
                "0912adbd-12345",
                "abcfdefghi",
                "abcfdef----->123",
                "abcfdefghi----->123" ]

for t in test_strings:
    print(t, check_hex_arrows(t))

gives:

0912adbd->12d1829-->218990d True
ab2c8d-->82a921->193acd7 True
0912adbd True
0912adbd-12345 False
abcfdefghi False
abcfdef----->123 False
abcfdefghi----->123 False