Home > Enterprise >  Counting running total on time series where condition is X
Counting running total on time series where condition is X

Time:01-31

I have a SQL table of date entries with three columns: date, item, and status. The table appears like this:

date item status
2023-01-01 A on
2023-01-01 B on
2023-01-01 C off
2023-01-02 A on
2023-01-02 B off
2023-01-02 C off
2023-01-02 D on
2023-01-03 A on
2023-01-03 B off
2023-01-03 C off
2023-01-03 D off

Looking at the most recent entries, I need grouped by item, the latest date and status, and a count on the running total of entries where status has not changed. For example, the output I am looking for would be:

latest_date item current_status number_of_days_on_current
2023-01-03 A on 3
2023-01-03 B off 2
2023-01-03 C off 3
2023-01-03 D off 1

How would I get the output I want in PostgreSQL 13.7?

This returns the latest date, item, and current status, but does not correctly count the number of days the item has been on the current status:

WITH CTE AS (
  SELECT 
    item, 
    date, 
    status, 
    LAG(status) OVER (PARTITION BY item ORDER BY date) AS prev_status, 
    ROW_NUMBER() OVER (PARTITION BY item ORDER BY date DESC) AS rn
  FROM 
    schema.table
)
SELECT 
  MAX(date) AS latest_date, 
  item, 
  status AS current_status, 
  SUM(CASE WHEN prev_status = status THEN 0 ELSE 1 END) 
    OVER (PARTITION BY item ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS number_of_days
FROM 
  CTE 
WHERE 
  rn = 1 
GROUP BY
item, status, prev_status, date
ORDER BY 
  item

CodePudding user response:

Using a subquery to find the maximum date and status entries:

select date(t.dt), t.item, e1.status, (select sum(case when e2.status = e1.status and t.item = e2.item then 1 end) 
   from entries e2) number_of_days_on_current 
from (select e.item, max(e.date) dt from entries e group by e.item) t 
join entries e1 on e1.date = date(t.dt) and e1.item = t.item

See fiddle.

CodePudding user response:

You could use the last_value function as the following:

with last_status as
(
  select *, 
      last_value(status) over (partition by item order by date_ 
        range between unbounded preceding and unbounded following) current_status
  from table_name
)
select max(date_) latest_date, 
       item,
       current_status, 
       count(*) number_of_days_on_current
from last_status
where current_status = status
group by item, current_status
order by item

See demo

  • Related