Home > Back-end >  DB2 - How to fill gaps in Intraday Stock Data
DB2 - How to fill gaps in Intraday Stock Data

Time:03-31

I have some old intraday 1m bar data that I am trying to clean up for an application I am using, that requires 390 1m bar ticks (9:30am - 16:00pm) of intraday data, per day, per stock. Unfortunately the older the data, the more gaps appear as the data only contains that which actually traded during said minute. ie: no trades = no volume = no data.

PCT_CHANGE is calculated by the underlying (CLOSE-OPEN)/OPEN*100 not shown, and reflects the percentage change since the OPEN to that of the CLOSE of the current TICK.

Example:

TICK        DATA_TICK  PCT_CHANGE  VOLUME
9:30:00 AM          
9:31:00 AM          
9:32:00 AM          
9:33:00 AM          
9:34:00 AM          
9:35:00 AM  9:35:00 AM  0       15500
9:36:00 AM  9:36:00 AM  0.06    1500
9:37:00 AM          
9:38:00 AM  9:38:00 AM  0.24    4000
9:39:00 AM  9:39:00 AM  0.2     4500
9:40:00 AM  9:40:00 AM  0.34    500
9:41:00 AM          
9:42:00 AM  9:42:00 AM  0.34    500
9:43:00 AM          
9:44:00 AM          
9:45:00 AM          
9:46:00 AM          
9:47:00 AM  9:47:00 AM  0.13    2000
9:48:00 AM  9:48:00 AM  0.13    1000
9:49:00 AM          
9:50:00 AM  9:50:00 AM  0.22    500
9:51:00 AM  9:51:00 AM  0.24    2500
9:52:00 AM  9:52:00 AM  0.24    1000
9:53:00 AM  9:53:00 AM  0.24    2000
9:54:00 AM          
9:55:00 AM  9:55:00 AM  0.13    500
9:56:00 AM          
9:57:00 AM  9:57:00 AM  0.13    2000
9:58:00 AM  9:58:00 AM  0.24    2000
9:59:00 AM  9:59:00 AM  0.24    500
10:00:00 AM 10:00:00 AM 0.13    500

So what I would like to be able to do is populate the missing 1m intervals PCT_CHANGE value with that of the previous existing row, and in the case of the opening minutes, then 0 (zero). Volume for all inserted values would be 0.

example

9:41AM pct_change would = 0.34 and volume = 0

9:43-46AM pct_change would all = 0.34 and volume = 0.

Is there some cunning feature, SQL statement, function etc within DB2 that would enable me to include such in a stored procedure, to run over 10 years worth of data for multiple stocks ? I've tried, with my very limited knowledge of SQL but the best I can do is fill a 1 minute gap. Where there are multiple minutes missing, I get woeful results.

If some kind soul could give me an example of how this might be achieved I would be very grateful.

Many thanks.

CodePudding user response:

You can do:

select 
  t.tick, t.data_tick,
  coalesce(t.pct_change, l.pct_change, 0) as pct_change,
  coalesce(t.volume, l.volume, 0) as volume
from t
left join (
  select t.*, (
    select tick from t a where data_tick <= t.tick order by data_tick desc 
    fetch first 1 rows only
  ) as related_tick
  from t
) r on r.tick = t.tick
left join t l on l.tick = r.related_tick
order by tick

Result (with modified data for testing purposes):

 TICK      DATA_TICK  PCT_CHANGE  VOLUME 
 --------- ---------- ----------- ------ 
 09:39:00             0.00        0      
 09:40:00  09:40:00   0.34        500    
 09:41:00             0.34        500    
 09:42:00  09:42:00   0.35        600    
 09:43:00             0.35        600    
 09:44:00             0.35        600    
 09:45:00             0.35        600    
 09:46:00             0.35        600    
 09:47:00  09:47:00   0.13        2000   

See running example at db<>fiddle.

  • Related