In migrating a text string from one database to another, I need to eliminate all special characters and keep only one space between words.
Unfortunately, the below code eliminates all spaces between words -- not what I want.
Here is the code I have. The "sig" field is wrong (it removes all spaces without leaving one space). Where did I go wrong?
TIA
WITH dbl_medications AS (
SELECT *
FROM dblink('select medname, sig, form from medications')
AS t1(medname text, sig text, form text)
ORDER BY medname, form, sig
)
INSERT INTO medications (medname, sig, form)
SELECT REGEXP_REPLACE(LOWER(REGEXP_REPLACE(medname,'[^a-zA-Z0-9 /-]','','g')), '^ | $| (?= )', '', 'g'),
REGEXP_REPLACE(LOWER(REGEXP_REPLACE(sig,'[^0-9a-zA-Z:/]',' ','g')), '^ | $| (?= )', '', 'g'),
LOWER(REGEXP_REPLACE(form,'[^a-zA-Z]','','g'))
FROM dbl_medications
ORDER BY 1,3,2
ON CONFLICT (medname, sig, form) DO NOTHING;
CodePudding user response:
You can use
REGEXP_REPLACE(LOWER(REGEXP_REPLACE(sig,'[^[:alnum:][:space:]:/] ',' ','g')), '^[[:space:]] |[[:space:]] $|[[:space:]] (?=[[:space:]])', '', 'g')
The first [^[:alnum:][:space:]:/]
regex is used to replace chunks of one or more chars other than alphanumeric, whitespace, :
and /
chars with a single space.
The ^[[:space:]] |[[:space:]] $|[[:space:]] (?=[[:space:]])
regex is used to remove leading (^[[:space:]]
) and trailing ([[:space:]] $
) whitespaces, and remove excessive whitespace ([[:space:]] (?=[[:space:]])
).