Home > Net >  How to remove all special characters and extra spaces between words with Regex in PostgreSQL
How to remove all special characters and extra spaces between words with Regex in PostgreSQL

Time:06-25

In migrating a text string from one database to another, I need to eliminate all special characters and keep only one space between words.

Unfortunately, the below code eliminates all spaces between words -- not what I want.

Here is the code I have. The "sig" field is wrong (it removes all spaces without leaving one space). Where did I go wrong?
TIA

WITH dbl_medications AS (
    SELECT * 
    FROM dblink('select medname, sig, form from medications')
    AS t1(medname text, sig text, form text)
    ORDER BY medname, form, sig
    )
INSERT INTO medications (medname, sig, form)    
    SELECT REGEXP_REPLACE(LOWER(REGEXP_REPLACE(medname,'[^a-zA-Z0-9 /-]','','g')), '^  |  $|  (?= )', '', 'g'),
           REGEXP_REPLACE(LOWER(REGEXP_REPLACE(sig,'[^0-9a-zA-Z:/]',' ','g')), '^  |  $|  (?= )', '', 'g'),
           LOWER(REGEXP_REPLACE(form,'[^a-zA-Z]','','g'))         
    FROM dbl_medications
    ORDER BY 1,3,2
ON CONFLICT (medname, sig, form) DO NOTHING;

CodePudding user response:

You can use

REGEXP_REPLACE(LOWER(REGEXP_REPLACE(sig,'[^[:alnum:][:space:]:/] ',' ','g')), '^[[:space:]] |[[:space:]] $|[[:space:]] (?=[[:space:]])', '', 'g')

The first [^[:alnum:][:space:]:/] regex is used to replace chunks of one or more chars other than alphanumeric, whitespace, : and / chars with a single space.

The ^[[:space:]] |[[:space:]] $|[[:space:]] (?=[[:space:]]) regex is used to remove leading (^[[:space:]] ) and trailing ([[:space:]] $) whitespaces, and remove excessive whitespace ([[:space:]] (?=[[:space:]])).

  • Related