I'm trying to (with a regular expression) find EVERYTHING specificed within my chararacer-set, after a delimiter (which is a colon).
Example:
Test3131:PythonBoolJava!Python
Overflow:PythonBoolFAKE!Python@021!
Overflo!w2:PythonBoolUnix-Python;?
Over3_flow:PythonBoolUnix^Python%
Desired output:
Test3131:PythonBoolJavaPython
Overflow:PythonBoolFAKEPython021
Overflo!w2:PythonBoolUnixPython
Over3_flow:PythonBoolUnixPython
So -
Ignore all data before and including the delimiter :
Search for all characters regardless of line position using the regex [\$&\ ,:;=\?@#\|'<>\.\^\*\(\)%!-]
Upon when matched, I would choose to mark in my dataset manually.
What I have tried:
However, this was to no avail.
.*:.*[\$&\ ,:;=\?@#\|'<>\.\^\*\(\)%!-]
However
CodePudding user response:
If available, for example Python's PyPi's regex module, maybe:
(?::|\G(?!^)).*?\K[!#-.:-@^|]
See an online demo. Notice how I condensed your character list down using the ascii-table to [!#-.:-@^|]
. It still would capture all characters you have given.
(?:
- Open non-capture group;:
- Capture the first colon;|
- Or;\G(?!^)
- Asssert position at end of previous match but exclude start-line;)
- Close non-capture group;
.*?\K
- 0 (Lazy) characters upto we reset starting point of reported match;[!#-.:-@^|]
- Any 1 of given characters.
Another option, if available through JavaScript or Python's PyPi regex module, for example, is a zero-width lookbehind:
(?<=^[^:]*:.*?)[!#-.:-@^|]
See an online demo
(?<=^[^:]*:.*?)
- Positive lookbehind to check if there is a colon after start-line anchor and 0 non-colon characters and any 0 (lazy) characters right after that;[!#-.:-@^|]
- Any 1 of given characters.
Code sample for Python:
import regex as re
l_in = ["Test3131:PythonBoolJavaPython", "Overflow:PythonBoolFAKEPython021", "Overflo!w2:PythonBoolUnixPython", "Over3_flow:PythonBoolUnixPython"]
l_out1 = [re.sub(r"(?::|\G(?!^)).*?\K[!#-.:-@^|] ", '', el) for el in l_in]
l_out2 = [re.sub(r"(?<=^[^:]*:.*?)[!#-.:-@^|] ", '', el) for el in l_in]
print(l_out1, l_out2)
Prints:
['Test3131:PythonBoolJavaPython',
'Overflow:PythonBoolFAKEPython021',
'Overflo!w2:PythonBoolUnixPython',
'Over3_flow:PythonBoolUnixPython']
['Test3131:PythonBoolJavaPython',
'Overflow:PythonBoolFAKEPython021',
'Overflo!w2:PythonBoolUnixPython',
'Over3_flow:PythonBoolUnixPython']