I want the period removed from each word in a senetence but only if the word is greater then a length of 1. Remove eriod from Spain. but not P. I have tried the following. It words on the first word Spain. but has the undesired effect of replacing U.S.A. with US.A and not U.S.A There could be any combination of countries or country codes ending in a period.
WITH
t1 AS (
SELECT
'Spain. S.S. Spain.' sentence,
'Spain S.S Spain' expected
UNION ALL
SELECT
'U.S.A. S.S. Spain.',
'U.S.A S.S Spain'
UNION ALL
SELECT
'P. SMITH S.S. Spain.',
'P. SMITH S.S Spain' )
SELECT
sentence,
REGEXP_REPLACE(sentence,r'(\w )(\.)(. )','\\1\\3') AS actual,
expected
FROM
t1;
CodePudding user response:
We can replace (\S )\.
with the first capture group:
SELECT sentence, REGEXP_REPLACE(sentence, r'(\S{2,})\.', r'\1') AS expected
FROM t1;