Home > Software design >  Find character selection after specified delimiter
Find character selection after specified delimiter

Time:08-07

I'm trying to (with a regular expression) find EVERYTHING specificed within my chararacer-set, after a delimiter (which is a colon).

Example:

Test3131:PythonBoolJava!Python
Overflow:PythonBoolFAKE!Python@021!
Overflo!w2:PythonBoolUnix-Python;?
Over3_flow:PythonBoolUnix^Python%

Desired output:

Test3131:PythonBoolJavaPython
Overflow:PythonBoolFAKEPython021
Overflo!w2:PythonBoolUnixPython
Over3_flow:PythonBoolUnixPython

So -
Ignore all data before and including the delimiter :
Search for all characters regardless of line position using the regex [\$&\ ,:;=\?@#\|'<>\.\^\*\(\)%!-]

Upon when matched, I would choose to mark in my dataset manually.

What I have tried:
However, this was to no avail.

.*:.*[\$&\ ,:;=\?@#\|'<>\.\^\*\(\)%!-]

However

CodePudding user response:

If available, for example Python's PyPi's regex module, maybe:

(?::|\G(?!^)).*?\K[!#-.:-@^|] 

See an online demo. Notice how I condensed your character list down using the ascii-table to [!#-.:-@^|]. It still would capture all characters you have given.

  • (?: - Open non-capture group;
    • : - Capture the first colon;
    • | - Or;
    • \G(?!^) - Asssert position at end of previous match but exclude start-line;
    • ) - Close non-capture group;
  • .*?\K - 0 (Lazy) characters upto we reset starting point of reported match;
  • [!#-.:-@^|] - Any 1 of given characters.

Another option, if available through JavaScript or Python's PyPi regex module, for example, is a zero-width lookbehind:

(?<=^[^:]*:.*?)[!#-.:-@^|] 

See an online demo

  • (?<=^[^:]*:.*?) - Positive lookbehind to check if there is a colon after start-line anchor and 0 non-colon characters and any 0 (lazy) characters right after that;
  • [!#-.:-@^|] - Any 1 of given characters.

Code sample for Python:

import regex as re

l_in = ["Test3131:PythonBoolJavaPython", "Overflow:PythonBoolFAKEPython021", "Overflo!w2:PythonBoolUnixPython", "Over3_flow:PythonBoolUnixPython"]
l_out1 = [re.sub(r"(?::|\G(?!^)).*?\K[!#-.:-@^|] ", '', el) for el in l_in]
l_out2 = [re.sub(r"(?<=^[^:]*:.*?)[!#-.:-@^|] ", '', el) for el in l_in]

print(l_out1, l_out2)

Prints:

['Test3131:PythonBoolJavaPython',
 'Overflow:PythonBoolFAKEPython021',
 'Overflo!w2:PythonBoolUnixPython',
 'Over3_flow:PythonBoolUnixPython']
['Test3131:PythonBoolJavaPython',
 'Overflow:PythonBoolFAKEPython021',
 'Overflo!w2:PythonBoolUnixPython',
 'Over3_flow:PythonBoolUnixPython']
  • Related