Currently if you download data from alphavantage, I get broken timestamps
date open high low close volume
2022-08-01 04:15:00 1.00 1.01 0.99 1.00 200
2022-08-01 04:30:00 1.00 1.03 1.00 1.02 300
2022-08-01 05:40:00 1.02 1.04 1.00 1.03 500
as you can see, between 04:30 to 05:45, there is data that I like to fill, so 2022-08-01 04:45, 22-08-01 05:00 etc
I have some conditions too as follows:
- missing date time values of ohlc will take from the last timestamp, so 04:45 ohlc will just be 04:30's data.
- volume to be 0 for those missing lines created.
- to have consistency in future donwloads, I like to specify start and end datetimes. So lets say this stock is AAPL, and I want 0415 to 0930, in future I want TSLA to also have similar timestamps to AAPL.
Anyway, desired output:
date open high low close volume
2022-08-01 04:15:00 1.00 1.01 0.99 1.00 200
2022-08-01 04:30:00 1.00 1.03 1.00 1.02 300
2022-08-01 04:45:00 1.00 1.03 1.00 1.02 0
2022-08-01 05:00:00 1.00 1.03 1.00 1.02 0
2022-08-01 05:15:00 1.00 1.03 1.00 1.02 0
2022-08-01 05:30:00 1.00 1.03 1.00 1.02 0
2022-08-01 05:45:00 1.02 1.04 1.00 1.03 500
Thanks alot, my head hurts...
edit: I tried this, full code here. However this could only fill missing timestamps with last data, volume is also from the last timestamp. This is working code if u can replace the api key with your own:
import pandas as pd
from alpha_vantage.timeseries import TimeSeries
import time
import pytz
from datetime import datetime
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 2000)
api_key = 'pls use your own api key'
def get_eastern_datetime(calctype=1):
# 1 - datetime, 2-date, 3-time
est = pytz.timezone('US/Eastern')
fmt = '%d-%m-%Y %H-%M'
if calctype == 1:
fmt = '%d-%m-%Y %H-%M'
elif calctype == 2:
fmt = '%d-%m-%Y'
elif calctype == 3:
fmt = '%H:%M'
current_date = datetime.now()
return current_date.astimezone(est).strftime(fmt)
def get_alphavantage(ticker):
# Pull data from alpha vantage. 1. search a company name, get the symbol of the match, the pull data.
ts = TimeSeries(key=api_key, output_format='pandas')
tuple_back = ts.get_symbol_search(keywords=ticker)
# print(tuple_back)
symbol = tuple_back[0].iloc[0, 0].strip() # tuple_back[0] is the df. So we choose the 0th line, 0th item which is the symbol.
df, meta_data = ts.get_intraday(symbol=symbol, interval='15min', outputsize='full') # check this line syntax
# print(meta_data)
# Clean up and save
df = df.iloc[::-1] # reverse rows so that earliest on top
df.columns = ['open', 'high', 'low', 'close', 'volume']
df.reset_index(inplace=True) # must, to remove date as index
df['date'] = pd.to_datetime(df['date'], format='%Y-%M-%D %H:%M%S') # '%Y%m%d%H%M'
print(df.head(20), '\n')
# Resample
df = df.set_index('date').sort_index().asfreq(freq='15T', method='ffill')
df.to_csv(r"C:\your own path\{} {}.csv".format(symbol, get_eastern_datetime(calctype=1)))
print(df.head(20), '\n')
if __name__ == '__main__':
get_alphavantage('GME')
CodePudding user response:
If I use .asfreq()
without method="ffill"
then I get rows with nan
and I can use .fillna(0)
for volume
, and .ffill()
for other columns - and I get expected results.
Minimal working example
data = ''' date open high low close volume
2022-08-01 04:15:00 1.00 1.01 0.99 1.00 200
2022-08-01 04:30:00 1.00 1.03 1.00 1.02 300
2022-08-01 05:45:00 1.02 1.04 1.00 1.03 500'''
import pandas as pd
import io
df = pd.read_csv(io.StringIO(data), sep='\s{2,}')
df['date'] = pd.to_datetime(df['date'])
# ---
df = df.set_index('date').sort_index().asfreq(freq='15T')
print('\n--- before ---\n')
print(df)
# ---
df['volume'] = df['volume'].fillna(0)
df['open'] = df['open'].ffill()
df['high'] = df['high'].ffill()
df['low'] = df['low'].ffill()
df['close'] = df['close'].ffill()
print('\n--- after ---\n')
print(df)
Result:
--- before ---
open high low close volume
date
2022-08-01 04:15:00 1.00 1.01 0.99 1.00 200.0
2022-08-01 04:30:00 1.00 1.03 1.00 1.02 300.0
2022-08-01 04:45:00 NaN NaN NaN NaN NaN
2022-08-01 05:00:00 NaN NaN NaN NaN NaN
2022-08-01 05:15:00 NaN NaN NaN NaN NaN
2022-08-01 05:30:00 NaN NaN NaN NaN NaN
2022-08-01 05:45:00 1.02 1.04 1.00 1.03 500.0
--- after ---
open high low close volume
date
2022-08-01 04:15:00 1.00 1.01 0.99 1.00 200.0
2022-08-01 04:30:00 1.00 1.03 1.00 1.02 300.0
2022-08-01 04:45:00 1.00 1.03 1.00 1.02 0.0
2022-08-01 05:00:00 1.00 1.03 1.00 1.02 0.0
2022-08-01 05:15:00 1.00 1.03 1.00 1.02 0.0
2022-08-01 05:30:00 1.00 1.03 1.00 1.02 0.0
2022-08-01 05:45:00 1.02 1.04 1.00 1.03 500.0