Home > OS >  How to count rows after the occurence of a value by group (postgresql)
How to count rows after the occurence of a value by group (postgresql)

Time:05-24

I have for example the following table:

Name Day Healthy
Jon 1 No
Jon 2 Yes
Jon 3 Yes
Jon 4 Yes
Jon 5 No
Mary 1 Yes
Mary 2 No
Mary 3 Yes
Mary 4 No
Mary 5 Yes

I want to add a column which counts the number of following days after day X a person was healthy:

Name Day Healthy Number of days the person was healthy after day X (incl.)
Jon 1 No 3
Jon 2 Yes 3
Jon 3 Yes 2
Jon 4 Yes 1
Jon 5 No 0
Mary 1 Yes 3
Mary 2 No 2
Mary 3 Yes 2
Mary 4 No 1
Mary 5 Yes 1

Is it possible to use some sort of window function to create such a column? Thanks a lot for the help!

CodePudding user response:

There are a couple of ways to do this with a window function. One is to order by day descending and use the default window. The other is to specify the window from the current row to the end of the partition.

This example casts the boolean healthy as an int so that it can be summed. If your table has literal Yes and No strings, then you can use sum((healthy = 'yes')::int) over (...) to achieve the same thing.

select name, day, 
       sum(healthy::int) 
         over (partition by name 
                   order by day
            rows between current row
                     and unbounded following) as num_subsequent_health_days
  from my_table;

 name | day | num_subsequent_health_days
 :--- | --: | -------------------------:
 Jon  |   1 |                          3
 Jon  |   2 |                          3
 Jon  |   3 |                          2
 Jon  |   4 |                          1
 Jon  |   5 |                          0
 Mary |   1 |                          3
 Mary |   2 |                          2
 Mary |   3 |                          2
 Mary |   4 |                          1
 Mary |   5 |                          1

db<>fiddle here

CodePudding user response:

I assume your relation has the following schema:

CREATE TABLE test(name text, day int, healthy boolean);

Then this should produce the desired result:

SELECT name, day, sum(mapped) OVER (PARTITION BY name ORDER BY day DESC RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM (SELECT name, day, CASE WHEN healthy THEN 1 ELSE 0 END AS mapped FROM test) sub ORDER BY name, day;
  • Related