I have a text column in my Postgres table and I want to remove any non-alphabetical characters and split it by a space (so I can use each word to do a search later).
I can remove the characters and split it successfully however my rows have empty results. I cannot have any empty words:
SELECT
asset.id,
string_to_array(TRIM(regexp_replace(asset.title, '[^a-zA-Z ]', '', 'g')), ' ')
FROM
assets asset
eg.
Hello world!
becomes {Hello,world}
but also Some Result V1.0 - 3D Model
becomes {Some,Result,V,,D,Model
How do I filter my array to remove empty words?
CodePudding user response:
You can transform array to table, then clean the content, and transform result to array again:
CREATE OR REPLACE FUNCTION public.remove_empty_string(text[])
RETURNS text[]
LANGUAGE sql
AS $function$
select array(select v from unnest($1) g(v) where v <> '')
$function$
(2022-06-20 05:17:57) postgres=# select remove_empty_string(array['Some','Result','V','','D','Model']);
┌─────────────────────────┐
│ remove_empty_string │
╞═════════════════════════╡
│ {Some,Result,V,D,Model} │
└─────────────────────────┘
(1 row)
CodePudding user response:
Try using a multicharacter regular expression
Try with this:
SELECT
asset.id,
string_to_array(TRIM(regexp_replace(asset.title, '[^a-zA-Z] ', ' ', 'g')), ' ')
FROM
assets asset
I just removed the whitespace from the regex in order to capture every non-alphabet character. Then, the indicates one or more coincidences, which ensures than any consecutive non-alphabetical character will get replaced with a single space. Finally, as you already do the trim, your split by whitespace will work.