Home > Enterprise >  1 step regexp_replace instead of two steps part ii, replace spaces either side of multiple words bet
1 step regexp_replace instead of two steps part ii, replace spaces either side of multiple words bet

Time:06-02

This a follow on from "1 step regexp_replace instead of two steps" as I did not provide enough sample data. @hatless provided me with a solution for removing spaces between , , and also @lemon suggested I provide more data

spaces should be removed from either side of words between delimiters. " new york " should be "new york" There could be spaces on either side of the word which should be removed but now between words. The comma delimited string could have any number of delimiters up to 8 comma's.

I could do multiple nested replaces of ', ' and ' ,' which would work for most cases except where there is more then one space before or after the comma. Can it be done with one regexp_replace or does it require multiple?

**"RESULT BEFORE" **
"university of washington, seattle, washington"
"university of washington, seattle , washington"
"university of washington, , washington"
"university of washington, seattle, washington"
"university of new york,ny , usa"
"university of new york,new york , usa"
"university of new york, new york , usa"
with t1 as 
(
select           1 id,"university of washington,           seattle, washington" location
union all select 2 id,"university of washington, seattle  , washington"
union all select 3 id,"university of washington,      , washington"
union all select 4 id,"university of washington, seattle            , washington"
union all select 5 id,"university of new york,ny  , usa"
union all select 6 id,"university of new york,new york  , usa"
union all select 7 id,"university of new york, new york  , usa"
)
select id,REGEXP_REPLACE(lower(location),r'([^,] ,)[, ] ', r'\1') location
from t1
order by 1;
**"DESIRED RESULT" **
"university of washington,seattle,washington"
"university of washington,seattle,washington"
"university of washington,washington"
"university of washington,seattle,washington"
"university of new york,ny,usa"
"university of new york,new york,usa"
"university of new york,new york,usa"

ACTUAL RESULT

id **location **
1 university of washington,seattle,washington
2 university of washington,seattle ,washington
3 university of washington,washington
4 university of washington,seattle ,washington
5 university of new york,ny ,usa
6 university of new york,new york ,usa
7 university of new york,new york ,usa

CodePudding user response:

Consider below

select id,
  regexp_replace(trim(location), r'\s*,\s*', ',') as location,
from t1
order by 1            

if applied to sample data in your question - output is

enter image description here

CodePudding user response:

Try this version:

SELECT id, REGEXP_REPLACE(LOWER(location),
                          r'([^,] )\s*,\s*([^,]*?)\s*,\s*(.*?)\s*',
                          r'\1,\2,\3') AS location
FROM t1
ORDER BY 1;

Here is a working demo showing that the regex replacement logic is working.

  • Related