Home > Back-end >  re2 bigquery regex remove last period from each word in a string
re2 bigquery regex remove last period from each word in a string

Time:12-10

I want the period removed from each word in a senetence but only if the word is greater then a length of 1. Remove eriod from Spain. but not P. I have tried the following. It words on the first word Spain. but has the undesired effect of replacing U.S.A. with US.A and not U.S.A There could be any combination of countries or country codes ending in a period.

WITH
  t1 AS (
  SELECT
    'Spain. S.S. Spain.' sentence,
    'Spain S.S Spain' expected
  UNION ALL
  SELECT
    'U.S.A. S.S. Spain.',
    'U.S.A S.S Spain'
  UNION ALL
  SELECT
    'P. SMITH S.S. Spain.',
    'P. SMITH S.S Spain' )
SELECT
  sentence,
  REGEXP_REPLACE(sentence,r'(\w )(\.)(. )','\\1\\3') AS actual,
  expected
FROM
  t1;

enter image description here

CodePudding user response:

We can replace (\S )\. with the first capture group:

SELECT sentence, REGEXP_REPLACE(sentence, r'(\S{2,})\.', r'\1') AS expected
FROM t1;
  • Related