NumPy convolve method has slight variance between equivalent for loop method for Volume Weighted Ave-CodePudding

I have a method for calculating the volume weighted average price given a stock. On the one hand, I have a readable, traditional for loop. However, it is very very slow.

I have tried to implement a version using numpy array method convolve. It performs SIGNIFICANTLY better (see RESULTS below), but the outputted values differ slightly from the standard for loop.

I wondered if it was the difference between integer and float division, but both of my VWAP methods are using float division. As of now I'm not sure what is accounting for the difference.

VWAP methods

def calc_vwap(price, volume, period_lookback):
    """
    Calculates the volume-weighted average price (VWAP) for a given period of time.
    The VWAP is calculated by taking the sum of the product of each price and volume over a given period, 
    and dividing by the sum of the volume over that period.
    
    Parameters:
        price (numpy.ndarray): A list or array of prices.
        volume (numpy.ndarray): A list or array of volumes, corresponding to the prices.
        period_lookback (int): The number of days to look back when calculating VWAP.
        
    Returns:
        numpy.ndarray: An array of VWAP values, one for each day in the input period.
    """
    vwap = np.zeros(len(price))
    for i in range(period_lookback, len(price)):
        lb = i - period_lookback  # lower bound
        ub = i   1  # upper bound
        volume_sum = volume[lb:ub].sum()
        if volume_sum > 0:
            vwap[i] = (price[lb:ub] * volume[lb:ub]).sum() / volume_sum
        else:
            vwap[i] = np.nan
    return vwap

def calc_vwap_speedy(price, volume, period_lookback):
    # Calculate product of price and volume
    price_volume = price * volume
    # Use convolve to get the rolling sum of product of price and volume and volume array
    price_volume_conv = np.convolve(price_volume, np.ones(period_lookback), mode='valid')
    volume_conv = np.convolve(volume, np.ones(period_lookback), mode='valid')
    # Create a mask to check if the volume sum is greater than 0
    mask = volume_conv > 0
    # Initialize the vwap array
    vwap = np.zeros(len(price))
    # Use the mask to check if volume sum is greater than zero, if it is, proceed with the division and store the result in vwap array, otherwise store NaN
    vwap[period_lookback-1:] = np.where(mask, price_volume_conv / volume_conv, np.nan)
    return vwap

Results

RUN TIME
standard -> 8.046217331999998
speedy ->   0.09436071299998616

OUTPUT
standard -> [0.         0.         0.         ... 0.49073531 0.48826866 0.49220622]
speedy ->   [0.         0.         0.         ... 0.49525183 0.48842067 0.49092021]

CodePudding user response：

The two methods does not compute the same thing. In the first, the last value is included in the sum while it is not the case for np.convolve. A 1 or -1 is missing in one of the method. See:

period_lookback = 1000
np.random.seed(42)
volume = np.random.normal(size=1024)
i = period_lookback
lb = i - period_lookback    # 0
ub = i   1                  # 1001
res1 = volume[lb:ub].sum()
res2 = np.convolve(volume, np.ones(period_lookback), mode='valid')[0]
res3 = volume[lb:ub-1].sum()
print(res1) # 20.731411258911493
print(res2) # 19.332055822325497
print(res3) # 19.33205582232549

CodePudding user response：

This can be done correctly as follows:

def calc_vwap_speedy(price, volume, period_lookback):
    # Calculate product of price and volume
    price_volume = price * volume
    # Use convolve to get the rolling sum of product of price and volume and volume array
    price_volume_conv = np.convolve(price_volume, np.ones(period_lookback 1), mode='valid')
    volume_conv = np.convolve(volume, np.ones(period_lookback 1), mode='valid')
    # Create a mask to check if the volume sum is greater than 0
    mask = volume_conv > 0
    # Initialize the vwap array
    vwap = np.zeros(len(price))
    # Use the mask to check if volume sum is greater than zero, if it is, proceed with the division and store the result in vwap array, otherwise store NaN
    vwap[period_lookback:] = np.where(mask, price_volume_conv/volume_conv, np.nan)
    return vwap