I have a method for calculating the volume weighted average price given a stock. On the one hand, I have a readable, traditional for loop. However, it is very very slow.
I have tried to implement a version using numpy array method convolve
. It performs SIGNIFICANTLY better (see RESULTS below), but the outputted values differ slightly from the standard for loop.
I wondered if it was the difference between integer and float division, but both of my VWAP methods are using float division. As of now I'm not sure what is accounting for the difference.
VWAP methods
def calc_vwap(price, volume, period_lookback):
"""
Calculates the volume-weighted average price (VWAP) for a given period of time.
The VWAP is calculated by taking the sum of the product of each price and volume over a given period,
and dividing by the sum of the volume over that period.
Parameters:
price (numpy.ndarray): A list or array of prices.
volume (numpy.ndarray): A list or array of volumes, corresponding to the prices.
period_lookback (int): The number of days to look back when calculating VWAP.
Returns:
numpy.ndarray: An array of VWAP values, one for each day in the input period.
"""
vwap = np.zeros(len(price))
for i in range(period_lookback, len(price)):
lb = i - period_lookback # lower bound
ub = i 1 # upper bound
volume_sum = volume[lb:ub].sum()
if volume_sum > 0:
vwap[i] = (price[lb:ub] * volume[lb:ub]).sum() / volume_sum
else:
vwap[i] = np.nan
return vwap
def calc_vwap_speedy(price, volume, period_lookback):
# Calculate product of price and volume
price_volume = price * volume
# Use convolve to get the rolling sum of product of price and volume and volume array
price_volume_conv = np.convolve(price_volume, np.ones(period_lookback), mode='valid')
volume_conv = np.convolve(volume, np.ones(period_lookback), mode='valid')
# Create a mask to check if the volume sum is greater than 0
mask = volume_conv > 0
# Initialize the vwap array
vwap = np.zeros(len(price))
# Use the mask to check if volume sum is greater than zero, if it is, proceed with the division and store the result in vwap array, otherwise store NaN
vwap[period_lookback-1:] = np.where(mask, price_volume_conv / volume_conv, np.nan)
return vwap
Results
RUN TIME
standard -> 8.046217331999998
speedy -> 0.09436071299998616
OUTPUT
standard -> [0. 0. 0. ... 0.49073531 0.48826866 0.49220622]
speedy -> [0. 0. 0. ... 0.49525183 0.48842067 0.49092021]
CodePudding user response:
The two methods does not compute the same thing. In the first, the last value is included in the sum while it is not the case for np.convolve
. A 1 or -1 is missing in one of the method. See:
period_lookback = 1000
np.random.seed(42)
volume = np.random.normal(size=1024)
i = period_lookback
lb = i - period_lookback # 0
ub = i 1 # 1001
res1 = volume[lb:ub].sum()
res2 = np.convolve(volume, np.ones(period_lookback), mode='valid')[0]
res3 = volume[lb:ub-1].sum()
print(res1) # 20.731411258911493
print(res2) # 19.332055822325497
print(res3) # 19.33205582232549
CodePudding user response:
This can be done correctly as follows:
def calc_vwap_speedy(price, volume, period_lookback):
# Calculate product of price and volume
price_volume = price * volume
# Use convolve to get the rolling sum of product of price and volume and volume array
price_volume_conv = np.convolve(price_volume, np.ones(period_lookback 1), mode='valid')
volume_conv = np.convolve(volume, np.ones(period_lookback 1), mode='valid')
# Create a mask to check if the volume sum is greater than 0
mask = volume_conv > 0
# Initialize the vwap array
vwap = np.zeros(len(price))
# Use the mask to check if volume sum is greater than zero, if it is, proceed with the division and store the result in vwap array, otherwise store NaN
vwap[period_lookback:] = np.where(mask, price_volume_conv/volume_conv, np.nan)
return vwap