Regex to match part of a hex-CodePudding

so I need to use regex to match a part of a hexadecimal string, but that part is random. Let me try to explain more:

So I have this hexa data:

70 75 62 71 00 7e 00 01 4c 00 06 72 61 6e 64 6f 6d 74 00 1c 4c 6a 2f 73 2f 6e 64 6f 6d 3b 78 70 77 25 00 00 00 20 f2 90 c2 91 c4 c4 ca 91 c0 c0 ca 91 94 cb c5 97 90 c5 90 c2 90 96 c7 ca 91 91 93 94 c6 c5 c6 cb c0 78

I need to match only the f2 in that case. But that is not always the case. Each data will be different. The only thing that is always the same is the '00 00 00' part and the '78' at the end. All the rest is random.

I managed to make the following regex: /(?=00 00 00). ?(?=78)/ The output is:

00 00 00 20 f2 90 c2 91 c4 c4 ca 91 c0 c0 ca 91 94 cb c5 97 90 c5 90 c2 90 96 c7 ca 91 91 93 94 c6 c5 c6 cb c0

But I dont know how to build a regex to take only the 'f2' (reminder: not always is going to be f2)

Any thoughts?

CodePudding user response：

Is the f2 surrounded by asterisks?

Without asterisks:

00 00 00 [a-f0-9]  (?<hexits>[a-f0-9] ). 78

With asterisks:

\*(?<hexits>[a-f0-9] )\*

CodePudding user response：

You can use the following regex to match the hexadecimal value after "00 00 00": /00 00 00 ([0-9A-Fa-f]{2})/. The value you want is in the capturing group, represented by \1.

Here is a demo:

import re

s = '70 75 62 71 00 7e 00 01 4c 00 06 72 61 6e 64 6f 6d 74 00 1c 4c 6a 2f 73 2f 6e 64 6f 6d 3b 78 70 77 25 00 00 00 20 f2 90 c2 91 c4 c4 ca 91 c0 c0 ca 91 94 cb c5 97 90 c5 90 c2 90 96 c7 ca 91 91 93 94 c6 c5 c6 cb c0 78'

match = re.search(r'00 00 00 ([0-9A-Fa-f]{2})', s)
if match:
    print(match.group(1))

The output will be:

f2

CodePudding user response：

You don't really need a regex for that. Get the offset of 3 bytes of zero in a row and take the 4th one after it:

s = '70 75 62 71 00 7e 00 01 4c 00 06 72 61 6e 64 6f 6d 74 00 1c 4c 6a 2f 73 2f 6e 64 6f 6d 3b 78 70 77 25 00 00 00 20 f2 90 c2 91 c4 c4 ca 91 c0 c0 ca 91 94 cb c5 97 90 c5 90 c2 90 96 c7 ca 91 91 93 94 c6 c5 c6 cb c0 78'
s2 = '01 02 03 00 00 00 05 06 07'

def locate(s):
    data = bytes.fromhex(s)
    offset = data.find(bytes([0,0,0]))
    return data[offset   4]

print(f'{locate(s):02X}')
print(f'{locate(s2):02X}')

Output:

F2
06

You could also extract the "f2" string directly from the string:

offset = s.index('00 00 00')
print(s[offset   12 : offset   14]) # 'f2'

CodePudding user response：

Given the explanation in this comment, the regex that you need is:

(?<=00 00 00 [0-9a-f]{2} )[0-9a-f]{2}

Providing the first input string from the question, this regex matches f2 (no spaces around it).
Check it online.

How it works:

(?<=                 # start of a positive lookbehind
  00 00 00           # match the exact string ("00 00 00 ")
  [0-9a-f]           # match one hex digit (lowercase only)
  {2}                # match the previous twice (i.e. two hex digits)
                     # there is a space after ")"
)                    # end of the lookbehind
[0-9a-f]{2}          # match two hex digits

The positive lookbehind works like a non-capturing group but it is not part of the match. Basically it says that the matching part ([0-9a-f]{2}) matches only if it is preceded by a match of the lookbehind expression.

The matching part of the expression is [0-9a-f]{2} (i.e. two hex digits).

You need to add i or whatever flag uses the regex engine that you use to denote "ignore cases" (i.e. the a-f part of regex also match A-F). If you cannot (or do not want to) provide this flag you can put [0-9A-Fa-f] everywhere and it works.

If your regex engine does not support lookbehind you can get the same result using capturing groups:

00 00 00 [0-9a-f]{2} ([0-9a-f]{2})

Applied on the same input, this regex matches 00 00 00 20 f2 and its first (and only) capturing group matches f2.
Check it online.