Basically, I want to remove the whitespace that exists after numbers
Input:
medication_title
CLORIDRATO DE VENLAFAXINA 75 MG
VIIBRYD 40 MG
KTRIZ UNO 0.6 U/G
Ouput:
medication_title medication_title2
CLORIDRATO DE VENLAFAXINA 75 MG CLORIDRATO DE VENLAFAXINA 75MG
VIIBRYD 40 MG VIIBRYD 40MG
KTRIZ UNO 0.6 U/G KTRIZ UNO 0.6U/G
Ideas?
CodePudding user response:
We can use a regex replacement here:
SELECT
medication_title,
REGEXP_REPLACE(medication_title,
'\y(\d (?:\.\d )?) ([^[:space:]]*G)\y',
'\1\2') AS medication_title2
FROM yourTable;
CodePudding user response:
You can capture the sequences with a regular expression and then assemble them back as needed, as in regexp_replace(x, '([^0-9]*[0-9]) ([^0-9\.] )', '\1\2')
.
For example:
select *, regexp_replace(x, '([^0-9]*[0-9]) ([^0-9\.] )', '\1\2') as y
from (
select 'CLORIDRATO DE VENLAFAXINA 75 MG'
union all select 'VIIBRYD 40 MG'
union all select 'KTRIZ UNO 0.6 U/G '
) t (x)
Result:
x y
-------------------------------- ------------------------------
CLORIDRATO DE VENLAFAXINA 75 MG CLORIDRATO DE VENLAFAXINA 75MG
VIIBRYD 40 MG VIIBRYD 40MG
KTRIZ UNO 0.6 U/G KTRIZ UNO 0.6U/G
CodePudding user response:
Use regexp_replace and this regular expression: (\d)\s([a-z])
, i.e. digit-whitespace-letter. Replace with the two captured characters w/o the whitespace.
select regexp_replace
(
$$CLORIDRATO DE VENLAFAXINA 75 MG
VIIBRYD 40 MG
KTRIZ UNO 0.6 U/G$$,
'(\d)\s([a-z])',
'\1\2', 'gi'
);