I have following values in a postgresql table. How can I extract all the string before '(' wherever it is present
col1
BCR-ABL (translocation) | MLL-AF4 (translocation) | E2A-PBX1 (translocation) | TEL-AML1 (translocation) | c-MYC (rearrangement) | CRLF2 (rearrangement) | PAX5 (rearrangement)
EVC1 | EVC2
EBF3
TMCO1
The desired output is:
col1
BCR-ABL | MLL-AF4 | E2A-PBX1 | TEL-AML1 | c-MYC | CRLF2 | PAX5
EVC1 | EVC2
EBF3
TMCO1
I am using following regex to extract that, but it is only excluding the last (
^(.*) \(.*
CodePudding user response:
Try:
SELECT REGEXP_REPLACE(col1, '\s*\([^()]*\)', '', 'g') AS col2 FROM test
See an online fiddle and a regex demo.
\s*
- 0 (Greedy) whitespace characters;\([^()]*\)
- 0 non-paranthesis between escaped literal opening- & closing paranthesis.
A possible "better" solution is to avoid potentially false positives if these paranthesis can also be in a value before the delimiters:
SELECT REGEXP_REPLACE(col1, '\s*\([^()]*\)(\s*\||$)', '\1', 'g') AS col2 FROM test
Here we use a backreference to whichever option was captured in the alternation. See an online fiddle and an online regex demo
\s*
- 0 (Greedy) whitespace characters;\([^()]*\)
- 0 non-paranthesis between escaped literal opening- & closing paranthesis;(\s*\||$)
- A 1st capture group to match 0 whitespace characters (to avoid having to trim a potential trailing space later) and a literal pipe-symbol or the end-line anchor.