I have a long string like this:
[left-ctrl]bhbhbhblbhbhblbhblblbhbl[left-ctrl][left-ctrl]blbhblbhblbbjbjbjblblbhblbhbhblbk[left-ctrl][left-ctrl]bhblblbjbjbkbjbkbjbkbkbh[left-ctrl]kkkkkk[left-cmd][tab][left-cmd][del]su[del][del]cut [del][left-shift];[left-shift][del]s[left-shift];
The actual string is much longer (160,000 char). I want to treat [...]
as a single char like b
, h
, ... . How?
Edited:
The problem is that a single [
and ]
can appear like [[[[[
, ]]]]]]
. My current idea is to use some library to pre-find the occurrence point of control chars like [left-ctrl]
, [cmd]
, ... . Then use a cursor
to loop-through it and take special care when the cursor is at these special point. But this idea might take multiple round to pre-find these special points. I'm thinking about whether there is a simpler way to do so efficiently. Regarding using library I'm just too lazy to implement the KMP algorithm myself.
Notice that it's possible a single ]
without [
will appear. E.g. [left-ctrl]]hbh[left-ctrl]
.
CodePudding user response:
I'm not sure I completely understand, but you could try something like this:
import re
re_content = re.compile(r'\[ (.*?)\] |([^\[\]] )')
string = ... # The string
for match in re_content.finditer(string):
in_brackets, not_in_brackets = match.groups()
if in_brackets: # Do the special stuff
print(f'In brackets: {in_brackets}')
if not_in_brackets: # Do the normal stuff
print(f'Not in brackets: {not_in_brackets}')
Output for
string = 'abc[left-ctrl]lbhbl[left-ctrl]]bhblbk[left-ctrl]bkbh[left-ctrl]kkkkkk[left-cmd][tab][del]su[del][del]cut [del][left-shift];[left-shift][del]s[left-shift];[[[[[del]]]]]]'
is
Not in brackets: abc
In brackets: left-ctrl
Not in brackets: lbhbl
In brackets: left-ctrl
Not in brackets: bhblbk
In brackets: left-ctrl
Not in brackets: bkbh
In brackets: left-ctrl
Not in brackets: kkkkkk
In brackets: left-cmd
In brackets: tab
In brackets: del
Not in brackets: su
In brackets: del
In brackets: del
Not in brackets: cut
In brackets: del
In brackets: left-shift
Not in brackets: ;
In brackets: left-shift
In brackets: del
Not in brackets: s
In brackets: left-shift
Not in brackets: ;
In brackets: del
CodePudding user response:
Option 1: Manual parsing
You could define an iterator function that accumulates the bracketed characters and yields the special keys as keywords when the matching brackets are found:
def keyCodes(iKeys):
special = ""
for c in iKeys:
if c == "[": # start of special char
if special: yield from special # flush prev. individual chars
special = c
elif c == "]" and special: # closing bracket
special = c
if len(special)>2: yield special # special code
else: yield from special # empty [], not a code
special = "" # reset
elif special:
special = c # accumulate special code
else:
yield c # return simple chars
yield from special # flush trailing chars
Output:
keys = "[left-ctrl]bhbhbhblbhbhblbhblblbhbl[left-ctrl][left-ctrl]blbhblbhblbbjbjbjblblbhblbhbhblbk[left-ctrl][left-ctrl]bhblblbjbjbkbjbkbjbkbkbh[left-ctrl]kkkkkk[left-cmd][tab][left-cmd][del]su[del][del]cut [del][left-shift];[left-shift][del]s[left-shift];"
for code in keyCodes(keys):
print(code)
[left-ctrl]
b
h
b
h
b
h
b
l
b
h
b
h
b
l
b
h
b
l
b
l
b
h
b
l
[left-ctrl]
[left-ctrl]
b
l
b
h
b
l
b
h
b
l
b
b
j
b
j
b
j
b
l
b
l
b
h
b
l
b
h
b
h
b
l
b
k
[left-ctrl]
[left-ctrl]
b
h
b
l
b
l
b
j
b
j
b
k
b
j
b
k
b
j
b
k
b
k
b
h
[left-ctrl]
k
k
k
k
k
k
[left-cmd]
[tab]
[left-cmd]
[del]
s
u
[del]
[del]
c
u
t
[del]
[left-shift]
;
[left-shift]
[del]
s
[left-shift]
;
Note that, the condition (if len(special)>2
) to determine if a special code should be output as a string or as individual characters probably needs to check against a list of valid special key codes (e.g. if special in specialCodes
) otherwise some key patterns may be returned as special codes when they are not (e.g. [xxx]
or [@]
).
Option 2: General regular expression pattern
If you don't mind using a library, the same result can be obtained using a regular expression:
for code in re.findall(r'\[[^\]\[] \]|.',keys):
print(code)
The expression has 2 parts, searched in order of precedence (using the pipe (|
) operator):
\[[^\]\[] \]
: At least one character between brackets (excluding other brackets).
: any single character
Like the previous solution, this may return invalid special codes for key sequences such as [abc]
Option 3: Specific regular expression pattern
If you do have a list of the valid special codes, you can build a regular expression to extract them specifically:
specialCodes = ['[tab]', '[left-ctrl]', '[left-shift]',
'[del]', '[left-cmd]']
keyCodes = re.compile("|".join(re.escape(c) for c in specialCodes) "|.")
for code in keyCodes.findall(keys):
print(code)
The pattern is built using the pipe (|
) operator to find the special codes first and ends with a catch all single character (.
) for normal keystrokes.
Regular expressions are known to sometimes be slow so I would suggest comparing the performance of these options on your data if processing is time sensitive.