Home > other >  Find all words in binary buffer using Python
Find all words in binary buffer using Python

Time:01-03

I want to find in binary buffer (bytes) all the "words" build from ascii lowercase and digits that only 5 chars length.

For example:

bytes(b'a\x1109ertx01\x03a54bb\x05') contains a54bb and 09ert .

Note the string abcdef121212 is larger than 5 chars so I don't want it

I have build that set

set([ord(i) for i in string.ascii_lowercase   string.digits])

What is the fastest way to do that using Python?

CodePudding user response:

My instinct would be to just go with regex here:

>>> import re
>>> buffer = b'a\x1109ertx01\x03a54bb\x05'
>>> re.findall(b"[a-zA-Z0-9]{5}", buffer)
[b'09ert', b'a54bb']

EDIT:

After your clarification, I would try just doing:

re.findall(b"[a-zA-Z0-9] ", buffer)

And then filtering for bytes of exactly length 5, so:

[x for x in re.findall(b"[a-zA-Z0-9] ", buffer) if len(x) == 5]
  • Related