Home > Back-end >  PostgreSQL array - Remove consecutive identical values
PostgreSQL array - Remove consecutive identical values

Time:12-08

In a PostgreSQL text array I want to remove consecutive identical values. A distinct isn't enough because I can have duplicate values but not consecutive, and I want to keep them. The order of values have importance.

For example

SELECT ARRAY['A', 'B', 'C', 'C', 'D', 'A'];

Should return {A,B,C,D,A}

-- Edit

AS @MkSpring Mk said, this post suggest an answer. I try to adapt it:

WITH q_array AS (
    SELECT ARRAY['A', 'B', 'C', 'C', 'D', 'A'] AS full_array
    ), q_unnest AS (
        SELECT  
        unnest(full_array) AS unnest_array
        FROM q_array
        ), q_id AS (
            SELECT
            row_number() OVER () AS id,
            unnest_array
            FROM q_unnest
        )
SELECT
array_agg(q_id.unnest_array) AS array_logical
FROM (SELECT q_id.*, lag(q_id.unnest_array) OVER (ORDER BY q_id.id) AS unnest_array_logical FROM q_id) q_id
WHERE unnest_array_logical IS DISTINCT FROM q_id.unnest_array

I find this syntax very verbose, maybe my approach is not efficient enough.

Is this syntax ok ? Is it a best practice to create a function or can I write it directly in a query ?

CodePudding user response:

This post was answered by your question. Please review. I hope it helped.

Remove consecutive duplicate rows in Postgresql

CodePudding user response:

To improve query performance, I made this function:

CREATE OR REPLACE FUNCTION array_remove_consecutive_duplicates(
    array_vals anyarray)
    RETURNS TABLE(array_logical anyarray) 
    LANGUAGE 'plpgsql'
    COST 100
    VOLATILE PARALLEL UNSAFE
    ROWS 1000

AS $BODY$
    BEGIN
      RETURN QUERY 
      WITH q_array AS (
    SELECT array_vals AS full_array
    ), q_unnest AS (
        SELECT  
        unnest(full_array) AS unnest_array
        FROM q_array
        ), q_id AS (
            SELECT
            row_number() OVER () AS id,
            unnest_array
            FROM q_unnest
        )
SELECT
array_agg(q_id.unnest_array) AS array_logical
FROM (SELECT q_id.*, lag(q_id.unnest_array) OVER (ORDER BY q_id.id) AS unnest_array_logical FROM q_id) q_id
WHERE unnest_array_logical IS DISTINCT FROM q_id.unnest_array;
    END;
  
$BODY$;

My query will now write like this:

SELECT array_remove_consecutive_duplicates(ARRAY['A', 'B', 'C', 'C', 'D', 'A']);

It's a bit faster and easier to use in a query than my fist approach. Maybe it's not the good way or my syntax isn't efficient. If someone have a better way to suggest, I'll take it.

  • Related