Home > front end >  alphavantage timeseries, fill missing datetime, volume of filled to be 0
alphavantage timeseries, fill missing datetime, volume of filled to be 0

Time:09-15

Currently if you download data from alphavantage, I get broken timestamps

    date               open     high    low    close    volume
2022-08-01 04:15:00    1.00     1.01    0.99   1.00     200
2022-08-01 04:30:00    1.00     1.03    1.00   1.02     300
2022-08-01 05:40:00    1.02     1.04    1.00   1.03     500

as you can see, between 04:30 to 05:45, there is data that I like to fill, so 2022-08-01 04:45, 22-08-01 05:00 etc

I have some conditions too as follows:

  1. missing date time values of ohlc will take from the last timestamp, so 04:45 ohlc will just be 04:30's data.
  2. volume to be 0 for those missing lines created.
  3. to have consistency in future donwloads, I like to specify start and end datetimes. So lets say this stock is AAPL, and I want 0415 to 0930, in future I want TSLA to also have similar timestamps to AAPL.

Anyway, desired output:

 date                  open     high    low    close    volume
2022-08-01 04:15:00    1.00     1.01    0.99   1.00     200
2022-08-01 04:30:00    1.00     1.03    1.00   1.02     300
2022-08-01 04:45:00    1.00     1.03    1.00   1.02     0
2022-08-01 05:00:00    1.00     1.03    1.00   1.02     0
2022-08-01 05:15:00    1.00     1.03    1.00   1.02     0
2022-08-01 05:30:00    1.00     1.03    1.00   1.02     0
2022-08-01 05:45:00    1.02     1.04    1.00   1.03     500

Thanks alot, my head hurts...

edit: I tried this, full code here. However this could only fill missing timestamps with last data, volume is also from the last timestamp. This is working code if u can replace the api key with your own:

import pandas as pd
from alpha_vantage.timeseries import TimeSeries
import time
import pytz
from datetime import datetime
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 2000)
api_key = 'pls use your own api key'


def get_eastern_datetime(calctype=1):
    # 1 - datetime, 2-date, 3-time
    est = pytz.timezone('US/Eastern')
    fmt = '%d-%m-%Y %H-%M'
    if calctype == 1:
        fmt = '%d-%m-%Y %H-%M'
    elif calctype == 2:
        fmt = '%d-%m-%Y'
    elif calctype == 3:
        fmt = '%H:%M'
    current_date = datetime.now()
    return current_date.astimezone(est).strftime(fmt)


def get_alphavantage(ticker):
    # Pull data from alpha vantage. 1. search a company name, get the symbol of the match, the pull data.
    ts = TimeSeries(key=api_key, output_format='pandas')
    tuple_back = ts.get_symbol_search(keywords=ticker)
    # print(tuple_back)
    symbol = tuple_back[0].iloc[0, 0].strip()  # tuple_back[0] is the df. So we choose the 0th line, 0th item which is the symbol.
    df, meta_data = ts.get_intraday(symbol=symbol, interval='15min', outputsize='full')  # check this line syntax
    # print(meta_data)

    # Clean up and save
    df = df.iloc[::-1]  # reverse rows so that earliest on top
    df.columns = ['open', 'high', 'low', 'close', 'volume']
    df.reset_index(inplace=True)  # must, to remove date as index
    df['date'] = pd.to_datetime(df['date'], format='%Y-%M-%D %H:%M%S')  # '%Y%m%d%H%M'
    print(df.head(20), '\n')

    # Resample
    df = df.set_index('date').sort_index().asfreq(freq='15T', method='ffill')
    df.to_csv(r"C:\your own path\{} {}.csv".format(symbol, get_eastern_datetime(calctype=1)))
    print(df.head(20), '\n')


if __name__ == '__main__':
    get_alphavantage('GME')

CodePudding user response:

If I use .asfreq() without method="ffill" then I get rows with nan and I can use .fillna(0) for volume, and .ffill() for other columns - and I get expected results.

Minimal working example

data = '''    date               open     high    low    close    volume
2022-08-01 04:15:00    1.00     1.01    0.99   1.00     200
2022-08-01 04:30:00    1.00     1.03    1.00   1.02     300
2022-08-01 05:45:00    1.02     1.04    1.00   1.03     500'''

import pandas as pd
import io

df = pd.read_csv(io.StringIO(data), sep='\s{2,}')
df['date'] = pd.to_datetime(df['date'])

# ---

df = df.set_index('date').sort_index().asfreq(freq='15T')

print('\n--- before ---\n')
print(df)

# ---

df['volume'] = df['volume'].fillna(0)

df['open'] = df['open'].ffill()
df['high'] = df['high'].ffill()
df['low'] = df['low'].ffill()
df['close'] = df['close'].ffill()

print('\n--- after ---\n')
print(df)

Result:

--- before ---

                     open  high   low  close  volume
date                                                
2022-08-01 04:15:00  1.00  1.01  0.99   1.00   200.0
2022-08-01 04:30:00  1.00  1.03  1.00   1.02   300.0
2022-08-01 04:45:00   NaN   NaN   NaN    NaN     NaN
2022-08-01 05:00:00   NaN   NaN   NaN    NaN     NaN
2022-08-01 05:15:00   NaN   NaN   NaN    NaN     NaN
2022-08-01 05:30:00   NaN   NaN   NaN    NaN     NaN
2022-08-01 05:45:00  1.02  1.04  1.00   1.03   500.0

--- after ---

                     open  high   low  close  volume
date                                                
2022-08-01 04:15:00  1.00  1.01  0.99   1.00   200.0
2022-08-01 04:30:00  1.00  1.03  1.00   1.02   300.0
2022-08-01 04:45:00  1.00  1.03  1.00   1.02     0.0
2022-08-01 05:00:00  1.00  1.03  1.00   1.02     0.0
2022-08-01 05:15:00  1.00  1.03  1.00   1.02     0.0
2022-08-01 05:30:00  1.00  1.03  1.00   1.02     0.0
2022-08-01 05:45:00  1.02  1.04  1.00   1.03   500.0
  • Related