I have a table that contains two columns - date and logical value 'flag', I need to write a query that will create a third column that will show how many times flag appeared in a row.
What are the possible ways to do it?
Here is the desired result:
date | flag | flag_in_a_row |
---|---|---|
1/1/2022 | 1 | 1 |
1/2/2022 | 1 | 2 |
1/3/2022 | 0 | 0 |
1/4/2022 | 1 | 1 |
1/5/2022 | 1 | 2 |
1/6/2022 | 1 | 3 |
1/7/2022 | 0 | 0 |
1/8/2022 | 1 | 1 |
CodePudding user response:
Schema (PostgreSQL v13)
CREATE TABLE test (
date date,
flag INT
);
INSERT INTO test VALUES ('1/1/2022', 1);
INSERT INTO test VALUES ('1/2/2022', 1);
INSERT INTO test VALUES ('1/3/2022', 0);
INSERT INTO test VALUES ('1/4/2022', 1);
INSERT INTO test VALUES ('1/5/2022', 1);
INSERT INTO test VALUES ('1/6/2022', 1);
INSERT INTO test VALUES ('1/7/2022', 0);
INSERT INTO test VALUES ('1/8/2022', 1);
Query #1
SELECT date, flag, SUM(flag) OVER(PARTITION BY grp ORDER BY date ) AS flag_in_a_row
FROM (
SELECT *, SUM(incr) OVER(ORDER BY date) AS grp
FROM (
SELECT *, CASE WHEN flag = LAG(flag) OVER(ORDER BY date) THEN NULL ELSE 1 END AS incr
FROM test
) q
) q;
date | flag | flag_in_a_row |
---|---|---|
2022-01-01T00:00:00.000Z | 1 | 1 |
2022-01-02T00:00:00.000Z | 1 | 2 |
2022-01-03T00:00:00.000Z | 0 | 0 |
2022-01-04T00:00:00.000Z | 1 | 1 |
2022-01-05T00:00:00.000Z | 1 | 2 |
2022-01-06T00:00:00.000Z | 1 | 3 |
2022-01-07T00:00:00.000Z | 0 | 0 |
2022-01-08T00:00:00.000Z | 1 | 1 |
CodePudding user response:
I don't know your business logic. But I write some ways with query samples for you.
- You can create a function, using return data by
for-loop
. And inside thefor-loop
you can easily be doing this.
Example:
CREATE OR REPLACE FUNCTION view_flag_table()
RETURNS SETOF flags
LANGUAGE plpgsql
AS $function$
declare
r flags%rowtype;
p_count integer;
begin
p_count = 0;
FOR r IN
SELECT "date", "flag", 0 FROM flags
loop
if (r."flag" = 1) then
p_count = p_count 1;
else
p_count = 0;
end if;
r."flag_in_row" = p_count;
RETURN NEXT r;
END LOOP;
RETURN;
end;
$function$;
-- RESULT:
2022-01-01 1 1
2022-01-02 1 2
2022-01-03 0 0
2022-01-04 1 1
2022-01-05 1 2
2022-01-06 1 3
2022-01-07 0 0
2022-01-08 1 1
But, this function gets low performance if you have a million data.
- Second way, you can write the trigger function, if when will be inserting data into the table then the trigger function gets the last record values and during the insertion process, you can write calculate your value into the "flag_in_a_row" field one time, on inserting process. And for any selecting table process, we will not use additional calculating.