Home > Software engineering >  How to remove empty words from my SQL string_to_array?
How to remove empty words from my SQL string_to_array?

Time:06-20

I have a text column in my Postgres table and I want to remove any non-alphabetical characters and split it by a space (so I can use each word to do a search later).

I can remove the characters and split it successfully however my rows have empty results. I cannot have any empty words:

SELECT
    asset.id,
    string_to_array(TRIM(regexp_replace(asset.title, '[^a-zA-Z ]', '', 'g')), ' ')
FROM
    assets asset

eg.
Hello world! becomes {Hello,world}
but also Some Result V1.0 - 3D Model becomes {Some,Result,V,,D,Model

How do I filter my array to remove empty words?

CodePudding user response:

You can transform array to table, then clean the content, and transform result to array again:

CREATE OR REPLACE FUNCTION public.remove_empty_string(text[])
 RETURNS text[]
 LANGUAGE sql
AS $function$
select array(select v from unnest($1) g(v) where v <> '')
$function$

(2022-06-20 05:17:57) postgres=# select remove_empty_string(array['Some','Result','V','','D','Model']);
┌─────────────────────────┐
│   remove_empty_string   │
╞═════════════════════════╡
│ {Some,Result,V,D,Model} │
└─────────────────────────┘
(1 row)

CodePudding user response:

Try using a multicharacter regular expression

Try with this:

SELECT
    asset.id,
    string_to_array(TRIM(regexp_replace(asset.title, '[^a-zA-Z] ', ' ', 'g')), ' ')
FROM
    assets asset

I just removed the whitespace from the regex in order to capture every non-alphabet character. Then, the indicates one or more coincidences, which ensures than any consecutive non-alphabetical character will get replaced with a single space. Finally, as you already do the trim, your split by whitespace will work.

  • Related