Writing a query to create a column for storing the number of identical values in a row-CodePudding

I have a table that contains two columns - date and logical value 'flag', I need to write a query that will create a third column that will show how many times flag appeared in a row.

What are the possible ways to do it?

Here is the desired result:

date	flag	flag_in_a_row
1/1/2022	1	1
1/2/2022	1	2
1/3/2022	0	0
1/4/2022	1	1
1/5/2022	1	2
1/6/2022	1	3
1/7/2022	0	0
1/8/2022	1	1

CodePudding user response：

Schema (PostgreSQL v13)

CREATE TABLE test (
  date date,
  flag INT
);
INSERT INTO test VALUES ('1/1/2022', 1);
INSERT INTO test VALUES ('1/2/2022', 1);
INSERT INTO test VALUES ('1/3/2022', 0);
INSERT INTO test VALUES ('1/4/2022', 1);
INSERT INTO test VALUES ('1/5/2022', 1);
INSERT INTO test VALUES ('1/6/2022', 1);
INSERT INTO test VALUES ('1/7/2022', 0);
INSERT INTO test VALUES ('1/8/2022', 1);

Query #1

SELECT date, flag, SUM(flag) OVER(PARTITION BY grp ORDER BY date ) AS flag_in_a_row
  FROM   (  
    SELECT *, SUM(incr) OVER(ORDER BY date) AS grp
    FROM   (
      SELECT *, CASE WHEN flag = LAG(flag) OVER(ORDER BY date) THEN NULL ELSE 1 END AS incr
      FROM test
    ) q
  ) q;

date	flag	flag_in_a_row
2022-01-01T00:00:00.000Z	1	1
2022-01-02T00:00:00.000Z	1	2
2022-01-03T00:00:00.000Z	0	0
2022-01-04T00:00:00.000Z	1	1
2022-01-05T00:00:00.000Z	1	2
2022-01-06T00:00:00.000Z	1	3
2022-01-07T00:00:00.000Z	0	0
2022-01-08T00:00:00.000Z	1	1

View on DB Fiddle

CodePudding user response：

I don't know your business logic. But I write some ways with query samples for you.

You can create a function, using return data by for-loop. And inside the for-loop you can easily be doing this.

Example:

CREATE OR REPLACE FUNCTION view_flag_table()
 RETURNS SETOF flags
 LANGUAGE plpgsql
AS $function$
declare 
    r flags%rowtype; 
    p_count integer;
begin

    p_count = 0; 
    
    FOR r IN
        SELECT "date", "flag", 0 FROM flags
    loop
        if (r."flag" = 1) then 
            p_count = p_count   1;
        else 
            p_count = 0;
        end if; 
       
        r."flag_in_row" = p_count;
       
        RETURN NEXT r;
    END LOOP;
   
    RETURN;
  
end;
$function$;


-- RESULT: 
2022-01-01  1   1
2022-01-02  1   2
2022-01-03  0   0
2022-01-04  1   1
2022-01-05  1   2
2022-01-06  1   3
2022-01-07  0   0
2022-01-08  1   1

But, this function gets low performance if you have a million data.

Second way, you can write the trigger function, if when will be inserting data into the table then the trigger function gets the last record values and during the insertion process, you can write calculate your value into the "flag_in_a_row" field one time, on inserting process. And for any selecting table process, we will not use additional calculating.