I want to find in binary buffer (bytes
) all the "words" build from ascii lowercase and digits that only 5 chars length.
For example:
bytes(b'a\x1109ertx01\x03a54bb\x05')
contains a54bb
and 09ert
.
Note the string abcdef121212
is larger than 5 chars so I don't want it
I have build that set
set([ord(i) for i in string.ascii_lowercase string.digits])
What is the fastest way to do that using Python?
CodePudding user response:
My instinct would be to just go with regex here:
>>> import re
>>> buffer = b'a\x1109ertx01\x03a54bb\x05'
>>> re.findall(b"[a-zA-Z0-9]{5}", buffer)
[b'09ert', b'a54bb']
EDIT:
After your clarification, I would try just doing:
re.findall(b"[a-zA-Z0-9] ", buffer)
And then filtering for bytes of exactly length 5, so:
[x for x in re.findall(b"[a-zA-Z0-9] ", buffer) if len(x) == 5]