Home > Mobile >  Creating a conditional column based on consecutive rows with same values in BigQuery
Creating a conditional column based on consecutive rows with same values in BigQuery

Time:12-24

I have researched other questions on a similar issue, but they are either in a different language or create unwanted, unnecessary columns.

I have a table1:

id        date        value
aaaaa     2021-01-01  true
aaaaa     2021-01-02  true
aaaaa     2021-01-03  false
aaaaa     2021-01-04  false
aaaaa     2021-01-05  false
aaaaa     2021-01-06  true
...
aaaaa     2021-12-31  false
aaaab     2021-01-01  true
...
zzzzz     2021-12-31  false

where id is a string-type entry, date ranges from 2021-01-01 to 2021-12-31, and value is a boolean-type entry, either true or false. The table is ordered by date.

I would like to generate a new table table2 with just two columns, id and passed.

  • id is the same column from the table1.
  • passed is also a boolean-type entry. When ordered by date in table1, if value is false for 3 consecutive rows for an id, passed is false for that id and otherwise is true.

Ideally, table2 should look like this:

id      passed
aaaaa   false
aaaab   true
...
zzzzz   true

I tried using a subquery such as this:

SELECT id,
       value,
       COUNT(*) AS cnt
  FROM (SELECT t.*,
               ROW_NUMBER() OVER i.date AS time1,
               ROW_NUMBER() OVER (PARTITION BY i.id, i.value ORDER BY i.date) AS time2
          FROM table1 t
       ) t
GROUP BY id, value, (time1 - time2)

...

but it returns an error as it does not recognize the window alias i.date. (Not to mention that this query probably wouldn't exactly deliver my desired output yet, as I didn't even put the condition of value having to be false in 3 consecutive rows.)

Any insight on this question is appreciated.

CodePudding user response:

You could use enter image description here

CodePudding user response:

You might consider below as well.

SELECT id, MIN(flag) OR COUNT(1) < 3 AS passed FROM (
  SELECT *, LAG(value, 1, true) OVER w OR value OR LEAD(value, 1, true) OVER w AS flag
    FROM table1
  WINDOW w AS (PARTITION BY id ORDER BY date)
) GROUP BY 1;

if value is false for 3 consecutive rows, passed is false for that id

  • A OR B OR C is false only when they are all false.
  • A : previous row (LAG), B : current row, C : following row (LEAD)

CodePudding user response:

I feel like below approach is most simple and straightforward for your case

select id, 
  instr(string_agg('' || value, '' order by date), repeat('false', 3)) = 0 as passed
from your_table
group by id 

if applied to sample data in your question - output is

enter image description here

  • Related