I have some old intraday 1m bar data that I am trying to clean up for an application I am using, that requires 390 1m bar ticks (9:30am - 16:00pm) of intraday data, per day, per stock. Unfortunately the older the data, the more gaps appear as the data only contains that which actually traded during said minute. ie: no trades = no volume = no data.
PCT_CHANGE is calculated by the underlying (CLOSE-OPEN)/OPEN*100 not shown, and reflects the percentage change since the OPEN to that of the CLOSE of the current TICK.
Example:
TICK DATA_TICK PCT_CHANGE VOLUME
9:30:00 AM
9:31:00 AM
9:32:00 AM
9:33:00 AM
9:34:00 AM
9:35:00 AM 9:35:00 AM 0 15500
9:36:00 AM 9:36:00 AM 0.06 1500
9:37:00 AM
9:38:00 AM 9:38:00 AM 0.24 4000
9:39:00 AM 9:39:00 AM 0.2 4500
9:40:00 AM 9:40:00 AM 0.34 500
9:41:00 AM
9:42:00 AM 9:42:00 AM 0.34 500
9:43:00 AM
9:44:00 AM
9:45:00 AM
9:46:00 AM
9:47:00 AM 9:47:00 AM 0.13 2000
9:48:00 AM 9:48:00 AM 0.13 1000
9:49:00 AM
9:50:00 AM 9:50:00 AM 0.22 500
9:51:00 AM 9:51:00 AM 0.24 2500
9:52:00 AM 9:52:00 AM 0.24 1000
9:53:00 AM 9:53:00 AM 0.24 2000
9:54:00 AM
9:55:00 AM 9:55:00 AM 0.13 500
9:56:00 AM
9:57:00 AM 9:57:00 AM 0.13 2000
9:58:00 AM 9:58:00 AM 0.24 2000
9:59:00 AM 9:59:00 AM 0.24 500
10:00:00 AM 10:00:00 AM 0.13 500
So what I would like to be able to do is populate the missing 1m intervals PCT_CHANGE value with that of the previous existing row, and in the case of the opening minutes, then 0 (zero). Volume for all inserted values would be 0.
example
9:41AM pct_change would = 0.34 and volume = 0
9:43-46AM pct_change would all = 0.34 and volume = 0.
Is there some cunning feature, SQL statement, function etc within DB2 that would enable me to include such in a stored procedure, to run over 10 years worth of data for multiple stocks ? I've tried, with my very limited knowledge of SQL but the best I can do is fill a 1 minute gap. Where there are multiple minutes missing, I get woeful results.
If some kind soul could give me an example of how this might be achieved I would be very grateful.
Many thanks.
CodePudding user response:
You can do:
select
t.tick, t.data_tick,
coalesce(t.pct_change, l.pct_change, 0) as pct_change,
coalesce(t.volume, l.volume, 0) as volume
from t
left join (
select t.*, (
select tick from t a where data_tick <= t.tick order by data_tick desc
fetch first 1 rows only
) as related_tick
from t
) r on r.tick = t.tick
left join t l on l.tick = r.related_tick
order by tick
Result (with modified data for testing purposes):
TICK DATA_TICK PCT_CHANGE VOLUME
--------- ---------- ----------- ------
09:39:00 0.00 0
09:40:00 09:40:00 0.34 500
09:41:00 0.34 500
09:42:00 09:42:00 0.35 600
09:43:00 0.35 600
09:44:00 0.35 600
09:45:00 0.35 600
09:46:00 0.35 600
09:47:00 09:47:00 0.13 2000
See running example at db<>fiddle.