Home > Enterprise >  SQLite Running Total (but keyed on values in a different table)
SQLite Running Total (but keyed on values in a different table)

Time:11-19

I'm trying to produce a running total, but need the last value for each row in a different table. In the following example, I can easily produce the running total for each time value in T, but I would like the running total of T for each time value in P (rather than for each transaction in T getting a price value, which is trivial):

Given a table of transactions T like:

user hour item delta
Alice 1 A 1
Alice 1 A 2
Bob 2 A 2
Alice 3 A 1
Bob 3 B 1
Alice 5 A -1
Bob 5 B 3

And a pricing table P like:

hour item price
1 A 1.1
1 B 1.2
2 A 2.1
2 B 2.2
3 A 3.1
3 B 3.2
4 A 4.1
4 B 4.2
5 A 5.1
5 B 5.2

I would like a record for each hour in P where a user's running total is non-zero. Something like:

hour item price user running_total
1 A 1.1 Alice 3
2 A 2.1 Alice 3
2 A 2.1 Bob 2
3 A 3.1 Alice 4
3 A 3.1 Bob 2
3 B 3.2 Bob 1
4 A 4.1 Alice 4
4 A 4.1 Bob 2
4 B 4.2 Bob 1
5 A 5.1 Alice 3
5 A 5.1 Bob 2
5 B 5.2 Bob 4

I'm fine with zeros or nulls instead of rows I've elided (i.e. prior to Bob having any items). The crucial thing I'm having trouble with, is that for each hour where an item has a price, I would like every user's balance.

I'm currently doing this very stupidly, in a procedural language, iterating over all hour values in P - but considering that I think I'm just looking for a filtered cartesian product between a table and a running total table, I think there must be a much better way of doing it.

My current solution of iterating over the pricing table (~3K rows in pricing table, 10K rows in transactions table) takes about 250 ms to do imperatively. The following SQL seems to do the job, but takes ~25 seconds, so I am hoping there is a better way of doing things:

with ranked_b as (
    select F.*, row_number() over (partition by p_hour, user, item order by hour desc) as rn
    from (select P.hour as p_hour, P.price, B.*  from P cross join (select distinct a.hour, a.user, a.item, sum(a.delta) over (partition by a.user, a.item order by a.hour) running_total from T a order by a.hour) B on P.item=B.item and B.hour<=P.hour  order by P_hour, B.user, B.item, B.hour) F
)  SELECT p_hour as hour, item, price, user, running_total from ranked_b where rn=1;

CodePudding user response:

There are 2 suggestions/simplifications for your code.

First, an ORDER BY clause without LIMIT inside a subquery is totally useless and does not affect the final result except that it decreases the query's performance.
So, remove them both from B and F subqueries.

Also, you are doing a CROSS JOIN, although you use an ON clause.
This is equivalent to an INNER JOIN and this is what you should use because (from Simple Select Processing):

The "CROSS JOIN" join operator produces the same result as the "INNER JOIN", "JOIN" and "," operators, but is handled differently by the query optimizer in that it prevents the query optimizer from reordering the tables in the join. An application programmer can use the CROSS JOIN operator to directly influence the algorithm that is chosen to implement the SELECT statement. Avoid using CROSS JOIN except in specific situations where manual control of the query optimizer is desired. Avoid using CROSS JOIN early in the development of an application as doing so is a premature optimization. The special handling of CROSS JOIN is an SQLite-specific feature and is not a part of standard SQL.

Try this:

WITH ranked_b AS (
  SELECT F.*, ROW_NUMBER() OVER (PARTITION BY p_hour, user, item ORDER BY hour DESC) rn
  FROM (
    SELECT P.hour p_hour, P.price, B.*  
    FROM P 
    INNER JOIN (
      SELECT DISTINCT hour, user, item, 
             SUM(delta) OVER (PARTITION BY user, item ORDER BY hour) running_total 
      FROM T
    ) B ON P.item = B.item AND B.hour <= P.hour  
  ) F
)  
SELECT p_hour hour, item, price, user, running_total 
FROM ranked_b 
WHERE rn = 1;

Or, another version which utilizes SQLite's bare columns:

SELECT p_hour hour, item, price, user, running_total
FROM (
  SELECT P.hour p_hour, P.price, B.*  
  FROM P 
  INNER JOIN (
    SELECT DISTINCT hour, user, item, 
           SUM(delta) OVER (PARTITION BY user, item ORDER BY hour) running_total 
    FROM T
  ) B ON P.item = B.item AND B.hour <= P.hour  
) F
GROUP BY p_hour, user, item
HAVING MAX(hour);

See the demo.

  • Related