Home > Software design >  PostgreSQL with regex: how to extract all substrings bounded by the same pair of characters
PostgreSQL with regex: how to extract all substrings bounded by the same pair of characters

Time:08-22

Given a name_loc column of text like the following:

{"Charlie – White Plains, NY","Wrigley – Minneapolis, MN","Ana – Decatur, GA"}

I'm trying to extract the names, ideally separated by commas:

Charlie, Wrigley, Ana

I've gotten this far:

SELECT SUBSTRING(CAST(name_loc AS VARCHAR) from '"([^ –] )')
FROM table

which returns Charlie

How can I extend this query to extract all names?

CodePudding user response:

You can do this with a combination of regexp_matches (to extract the names), array_agg (to regroup all matches in a row) and array_to_string (to format the array as you'd like, e.g. with a comma separator):

WITH input(name_loc) AS (
    VALUES ('{"Charlie – White Plains, NY","Wrigley – Minneapolis, MN","Ana – Decatur, GA"}')
         , ('{"Other - somewhere}') -- added this to show multiple rows do not get merged
)
SELECT array_to_string(names, ', ')
FROM input
CROSS JOIN LATERAL (
    SELECT array_agg(name)
    FROM regexp_matches(name_loc, '"(\w )', 'g') AS f(name)
) AS f(names);
array_to_string
Charlie, Wrigley, Ana
Other

View on DB Fiddle

CodePudding user response:

My two cents, though I'm rather new to postgreSQL and I had to copy the 1st piece from @Marth's his answer:

WITH input(name_loc) AS (
    VALUES ('{"Charlie – White Plains, NY","Wrigley – Minneapolis, MN","Ana – Decatur, GA"}')
         , ('{"Other - somewhere"}') 
)
SELECT REGEXP_REPLACE(name_loc, '{?(,)?"(\w )[^"] "}?','\1\2', 'g') FROM input;
regexp_replace
Charlie,Wrigley,Ana
Other
  • Related