What I have: A single table with ids and prices for date and time.
What I want to do: To get the average difference of price for every id between the first and 15 minute of every day. Conditions: I also want to use the last available price in 15 minutes if the 15 is missing. For example if 15 is missing, use the price in 14 minutes.
What I did: My first step was to get the price after 15 minutes of every day for every id.
My code (I did a lot of tests with it, sorry if it doesn't look good and optimised enough) to get the last price(but not last available) in the 15 minutes interval:
WITH s as (
SELECT DATE(date_value) as dt, date_value, stock_id sid, open FROM stocks_1min
)
SELECT sid, dt, date_value, s.open as second_price
FROM s
INNER JOIN (
SELECT q.stock_id, DATE(q.date_value) as dat
FROM stocks_1min q
GROUP BY q.stock_id, dat
ORDER BY q.stock_id ASC
) as end_price
ON s.sid = end_price.stock_id
AND s.date_value = (SELECT date_value FROM s WHERE s.sid = end_price.stock_id AND s.dt = end_price.dat LIMIT 1) interval '15 minutes'
ORDER BY sid, dt;
What I didn't do:
- get the last AVAILABLE price in that interval.
- get the first price in that interval and use it in the same query with the last price in the interval
- If I got both prices then I can use AVG and Group By to get the needed value.
Expected result (example): Table:
|stock_id|open(price)|date_value |
|1 |10 |01-12-2020 09:31|
|1 |12 |01-12-2020 09:46|
|1 |14 |02-12-2020 09:31|
|1 |20 |02-12-2020 09:46|
|2 |14 |01-12-2020 09:31|
|2 |16 |01-12-2020 09:46|
|2 |2 |02-12-2020 09:31|
|2 |16 |02-12-2020 09:46|
Result:
|stock_id|average_difference|
| 1 | 4 | ((12-10) (20-14)) / 2 = 4
| 2 | 8 | ((16-14) (16-2)) / 2 = 8
I have never used "Partition By", but maybe here is the place to start with it? It's my own project, where I want to test some statistical ideas for the stock market. Thanks in advance!
You can use this link where it's a sample for testing: http://sqlfiddle.com/#!17/83597/1
CodePudding user response:
If I have understood correctly, you want to:
- List the prices of each stock from each day during the first 15 minutes of the day
- Identify the first and last (as defined by date_value) of the prices listed in step 1
- Calculate the difference between the prices identified in step 2, by stock and day
- Calculate the average of the differences calculated in step 3, by stock
If I have interpreted your question correctly, then:
WITH dat
AS
(
SELECT stock_id
,date_value::date AS date_of_date_value
,date_value
,MIN(date_value) OVER (PARTITION BY stock_id, date_value::date) AS min_date_value
,MAX(date_value) OVER (PARTITION BY stock_id, date_value::date) AS max_date_value
,"open" AS open_price
FROM stocks_1min
WHERE date_value::time <= '09:45:00.000'
),
dat2
AS
(
SELECT stock_id
,date_of_date_value
,AVG(CASE WHEN min_date_value = date_value THEN open_price ELSE NULL END) AS open_price_from
,AVG(CASE WHEN max_date_value = date_value THEN open_price ELSE NULL END) AS open_price_to
,AVG(CASE WHEN max_date_value = date_value THEN open_price ELSE NULL END)
- AVG(CASE WHEN min_date_value = date_value THEN open_price ELSE NULL END) AS open_price_diff
FROM dat
WHERE date_value = min_date_value OR date_value = max_date_value
GROUP BY stock_id
,date_of_date_value
)
SELECT stock_id, AVG(open_price_diff) AS avg_open_price_diff
FROM dat2
GROUP BY stock_id;
You may find that you need to wrap the calculation of open_price_diff with ABS
, but I have not done that in the example above.
CodePudding user response:
I continued with the idea from your SQL and tried to get the solution without analytical function which you can if you want to. Here are the steps:
Step 1: Minimum recorded time for each day.
with day_min_time
as
(
SELECT Min(date_value) min_time,
Date(date_value) dt,
stock_id
FROM stock_1min
GROUP BY Date(date_value),
stock_id
)
Step 2: Maximum recorded time for each day which is 15 minutes from minimum recorded time for the day.It is using the CTE from step 1.
with day_max_time
as
(
SELECT Max(sd.date_value) max_time ,
Date(sd.date_value) dt,
sd.stock_id
FROM stock_1min sd
JOIN day_min_time dmt
ON sd.stock_id = dmt.stock_id
AND Date(sd.date_value) = dmt.dt
WHERE sd.date_value <= dmt.min_time interval '15 minutes'
GROUP BY date(sd.date_value) ,
sd.stock_id)
Step 3: Use the time boundaries obtained from Step 1 and Step 2 to get the maximum and minimum prices.
SELECT Max(sd.open) - Min(sd.open) price_difference,
sd.stock_id,
DATE(sd.date_value)
FROM stock_1min sd
INNER JOIN day_min_time min_dmt
ON min_dmt.stock_id = sd.stock_id
AND DATE(sd.date_value) = min_dmt.dt
INNER JOIN day_max_time max_dmt
ON max_dmt.stock_id = sd.stock_id
AND DATE(sd.date_value) = max_dmt.dt
WHERE sd.date_value BETWEEN min_dmt.min_time AND max_dmt.max_time
GROUP BY sd.stock_id,
DATE(sd.date_value)
Step 4: Get the average difference using the result from step 3:
SELECT Round(Avg(difference)) average_difference,
stock_id
FROM (SELECT Max(sd.open) - Min(sd.open) price_difference,
sd.stock_id,
DATE(sd.date_value)
FROM stock_1min sd
INNER JOIN day_min_time min_dmt
ON min_dmt.stock_id = sd.stock_id
AND DATE(sd.date_value) = min_dmt.dt
INNER JOIN day_max_time max_dmt
ON max_dmt.stock_id = sd.stock_id
AND DATE(sd.date_value) = max_dmt.dt
WHERE sd.date_value BETWEEN min_dmt.min_time AND max_dmt.max_time
GROUP BY sd.stock_id,
DATE(sd.date_value)) final_inline_view
GROUP BY stock_id;
Final SQL:Combining Everything:
WITH
day_min_time AS
(
SELECT Min(date_value) min_time,
Date(date_value) dt,
stock_id
FROM stock_1min
GROUP BY Date(date_value),
stock_id ),
day_max_time AS
(
SELECT Max(sd.date_value) max_time ,
Date(sd.date_value) dt,
sd.stock_id
FROM stock_1min sd
JOIN day_min_time dmt
ON sd.stock_id = dmt.stock_id
AND Date(sd.date_value) = dmt.dt
WHERE sd.date_value <= dmt.min_time interval '15 minutes'
GROUP BY date(sd.date_value) ,
sd.stock_id )
SELECT round(avg(price_difference)) average_difference,
stock_id
FROM (
SELECT max(sd.OPEN)-min(sd.OPEN) price_difference,
sd.stock_id ,
date(sd.date_value)
FROM stock_1min sd
INNER JOIN day_min_time min_dmt
ON min_dmt.stock_id = sd.stock_id
AND date(sd.date_value) = min_dmt.dt
INNER JOIN day_max_time max_dmt
ON max_dmt.stock_id = sd.stock_id
AND date(sd.date_value) = max_dmt.dt
WHERE sd.date_value BETWEEN min_dmt.min_time AND max_dmt.max_time
GROUP BY sd.stock_id ,
date(sd.date_value) ) final_inline_view
GROUP BY stock_id;
Output:
average_difference | stock_id
-------------------- ----------
4 | 1
8 | 2
(2 rows)
Table:
create table stock_1min(stock_id int , open int , date_value timestamp);
insert into stock_1min values
(1,10,'2020-12-01 09:31'),
(1,12,'2020-12-01 09:46'),
(1,14,'2020-12-02 09:31'),
(1,20,'2020-12-02 09:46'),
(2,14,'2020-12-01 09:31'),
(2,16,'2020-12-01 09:46'),
(2,2,'2020-12-02 09:31'),
(2,16,'2020-12-02 09:46');
postgres=# select * from stock_1min;
stock_id | open | date_value
---------- ------ ---------------------
1 | 10 | 2020-12-01 09:31:00
1 | 12 | 2020-12-01 09:46:00
1 | 14 | 2020-12-02 09:31:00
1 | 20 | 2020-12-02 09:46:00
2 | 14 | 2020-12-01 09:31:00
2 | 16 | 2020-12-01 09:46:00
2 | 2 | 2020-12-02 09:31:00
2 | 16 | 2020-12-02 09:46:00
(8 rows)