I'm trying to come up with an SQL solution for the following problem. I have a table in which stock transactions are saved. As transactions don't occur on a daily basis the time series in the table is incomplete.
Table TRANSACTION
Date | Depot | Ticker | Buy or Sell | Shares |
---|---|---|---|---|
2022-02-01 | A | MSFT | BUY | 100 |
2022-02-04 | B | AAPL | BUY | 20 |
2022-02-05 | A | MSFT | SELL | 80 |
2022-02-08 | A | PG | BUY | 50 |
2022-02-10 | B | T | BUY | 10 |
Now I would like to write a SQL statement that gives me the daily share count per depot an ticker for every day since 2022-02-01. The expected result would be this.
Date | Depot | Ticker | Share Count |
---|---|---|---|
2022-02-01 | A | MSFT | 100 |
2022-02-02 | A | MSFT | 100 |
2022-02-03 | A | MSFT | 100 |
2022-02-04 | A | MSFT | 100 |
2022-02-04 | B | AAPL | 20 |
2022-02-05 | A | MSFT | 20 |
2022-02-05 | B | AAPL | 20 |
2022-02-06 | A | MSFT | 20 |
2022-02-06 | B | AAPL | 20 |
2022-02-07 | A | MSFT | 20 |
2022-02-07 | B | AAPL | 20 |
2022-02-08 | A | MSFT | 20 |
2022-02-08 | A | PG | 50 |
2022-02-08 | B | AAPL | 50 |
2022-02-09 | A | MSFT | 20 |
2022-02-09 | A | PG | 50 |
2022-02-09 | B | AAPL | 50 |
2022-02-10 | A | MSFT | 20 |
2022-02-10 | A | PG | 50 |
2022-02-10 | B | AAPL | 50 |
2022-02-10 | B | T | 10 |
What I'v done so far:
- A: Getting a list of all dates between 2022-02-01 and the current date (e.g. 2022-02-10).
- B: Getting a running total over the shares column by depot and ticker ordered by date.
- A Left join B
WITH RECURSIVE dates(date) AS (
VALUES('2022-02-01')
UNION ALL
SELECT date(date, ' 1 day')
FROM dates
WHERE date < DATE()
)
SELECT dt.Date
, tr.Date
, tr.Ticker
, tr.Shares AS [Share Count]
FROM dates dt
LEFT JOIN (
SELECT Date
, Depot
, Ticker
, SUM(shares) OVER (PARTITION BY Depot, Ticker ORDER BY Date ROWS UNBOUNDED PRECEDING) AS Shares
FROM TRANSACTION
ORDER BY Depot, Date, Ticker
) ON (
tr.Date <= dt.Date
)
ORDER BY dt.Date, Ticker
That doesn't seem to do the trick as the running total does not properly aggregate the date rows. The result looks more like a kind of cross join between the synthetic date table and the transaction table.
Where am I wrong? Has anyone a solution for my problem?
Thanks a lot!
CodePudding user response:
You are almost there, just expanded on your idea. I have hard-coded end date in below example, but that can be changed as per requirement.
The dates CTE is just building dates within the needed range. In the next sub-query (t1), there are two more columns added - column md for max date and column s_shares to get a running sum of shares based on if share is bought (added) or subtracted if sold.
We then select from the two sets values between dates max date (column md) and original date.
WITH RECURSIVE dates(date1) AS (
select date('2022-02-01')
UNION ALL
SELECT date(date1, ' 1 day')
FROM dates
WHERE date1 < date('2022-02-10')
) select d.date1,depot,ticker, s_shares
from dates d,
(select date1,
case
max(date1) over (partition by depot, ticker order by null)
when date1 then date('2022-02-10', ' 1 day')
else max(date1) over (partition by depot, ticker order by null)
end md,
depot, ticker, buyorsell,
sum(case when buyorsell = 'SELL' then -(Shares) else Shares end)
over (partition by depot, ticker order by Date1) s_shares
from tic) t1
where d.date1< t1.md
and d.date1>= t1.date1
order by depot, ticker, d.date1;
Refer to the fiddle here.