Efficient SQL Query to calculate portion of a row in half hourly time series that has occurred-CodePudding

I have a table that looks like this:

id	slot	total
1	2022-12-01T12:00	100
2	2022-12-01T12:30	150
3	2022-12-01T13:00	200

There's an index on slot already. The table has ~100mil rows (and a bunch more columns not shown here)

I want to sum the total up to the current moment in time. Let's say the time is currently 2022-12-01T12:45. If I run select * from my_table where slot < CURRENT_TIMESTAMP(), then I get back records 1 and 2.

However, in my data, the records represent forecasted sales within a time slot. I want to find the forecasts as of 2022-12-01T12:45, and so I want to find the proportion of the half hour slot of record 2 that has elapsed, and return that proportion of the total.

As of 2022-12-01T12:45 (assuming minute granularity), 50% of row 2 has elapsed, so I would expect the total to return as 150 / 2 = 75.

My current query works, but is slow. What are some ways I can optimise this, or other approaches I can take?

Also, how can we extend this solution to be generalised to any interval frequency? Maybe tomorrow we change our forecasting model and the data comes in sporadically. The hardcoded 30 would not work in that case.

select sum(fraction * total) as t from
 select total,
        LEAST(
   timestampdiff(
   minute,
   datetime,
   current_timestamp()
    ),
   30
) / 30 as fraction

  from my_table
  where slot <= current_timestamp()

CodePudding user response：

Consider computing your sum first, then remove the last element partial total. In order to keep the last element total, I'd prefer applying window functions instead of aggregations, and limit the output to the last row.

SET @current_time = CURRENT_TIMESTAMP();

WITH cte AS (
    SELECT slot, 
           SUM(total) OVER(ORDER BY slot) AS total,
           total                          AS rowtotal
    FROM my_table 
    WHERE slot < @current_time
    ORDER BY slot DESC
    LIMIT 1
)
SELECT slot, 
       total - (30 - TIMESTAMPDIFF(MINUTE, 
                                   slot,
                                   @current_time))
               /30 * rowtotal AS total
FROM cte

Check the demo here.

Note1: Adding an index on the slot field is likely to boost this query performance.

Note2: If your query is running on millions of data, your timestamp may be likely to change during the query. You could store it into a variable before the query is run (or into another cte).

CodePudding user response：

create an ondex in slot column btree as it is having high selectivity;