Home > Mobile >  Multiple subqueries and window operations SQLite
Multiple subqueries and window operations SQLite

Time:06-12

I am having a hard time coming up with the correct query to get the data I need, and I wonder whether it's something that can realistically be done using SQL or if I should deal with it in Python.

My first Table (api_results) looks like this

ticker entry date change
AAPL 2 2022-06-09 1.05
TSLA 7 2022-09-09 -0.34

And the second one (historical_data) looks like this, excluding irrelevant columns.

Ticker Date Change
AAPL 2022-06-09 1.05
AAPL 2022-07-09 -0.34
AAPL 2022-08-09 2.5
AAPL 2022-09-09 1.12
AAPL 2022-10-09 ...
AAPL 2022-11-09 ...
MSFT 2022-06-09 ...
MSFT 2022-07-09 ...
MSFT 2022-08-09 ...
MSFT 2022-09-09 ...
MSFT 2022-10-09 ...
MSFT 2022-11-09 ...

I am trying to get the date in api_results.date for every ticker in api_results and see what the sum of changes in a 10-days span is for each ticker in api_results. The non-dynamic query is the following:

SELECT ticker, Date,  sum(change) as change FROM
(
    SELECT change, ticker, Date FROM historical_data
    WHERE ticker = 'T' AND date >= '2013-12-13 00:00:00'
    limit 10
)

Which returns

Ticker Date Change
T 2013-12-13 3.76

The result is correct, but how can I do the same thing for every date-ticker pair in api_results. The idea is to apply a function to every row of the table. What I would do in Python is this:

import sqlite3
from config import db_path
import pandas as pd

connection = sqlite3.connect(db_path)
cursor = connection.cursor() 

historical_data = pd.read_sql("SELECT Date, Ticker, Change from historical_data", connection)
 
api_results = cursor.execute("SELECT ticker, date from api_results").fetchall()

data = []

for ticker, date in api_results:
    index = list(historical_data['Date'].index(date))
    data.append(historical_data.iloc[index:index 10]['Change'].sum())

This appears to be working just fine, but it takes way too long. After two hours, the loop was still going. Mind you, that historical data has 1M rows of data and 30 columns.

Should I just find a better way of doing it in Python, like vectorizing a function over historical data, or even just building a data frame and shifting the change appropriately, or can this be done using SQL? Thank you for your time and help.

CodePudding user response:

The query I came up with is the following:

SELECT 
    t1.ticker, t1.date, t2.change_10
FROM 
    api_results AS t1
INNER JOIN (
    SELECT 
        ticker, 
        date, 
        SUM(change) OVER (
            PARTITION BY ticker
            ORDER BY date
            RANGE BETWEEN 0 PRECEDING AND 10 FOLLOWING
        ) AS change_10
    FROM 
        historical_data
    ORDER BY 
        ticker, date
) AS t2
ON t1.ticker = t2.ticker AND t1.date = t2.date

where sub-query builds the rolling sum over changes in the historical data for 10 following entries, as you do with index:index 10 which I called change_10. In order to get the relevant bits from the sub-query, I use an inner join on a query on api_results to get the needed ticker and date combinations.

  • Related