Home > database >  Since nested views are seen as taboo - how else should I go about constructing an extremely verbose
Since nested views are seen as taboo - how else should I go about constructing an extremely verbose

Time:05-22

Background: A webdev did not take SQL seriously enough back in college and is now regretting it while working for a financial company using Snowflake as a data warehouse to compute statistics.

We have 3 source tables that are used for all calculations:

  • Positions:
create or replace TABLE POS (
    ACCOUNT_NUMBER VARCHAR(15) NOT NULL,
    ACCOUNT_TYPE VARCHAR(30),
    SECURITY_TYPE VARCHAR(30) NOT NULL,
    SYMBOL VARCHAR(30) NOT NULL,
    QUANTITY NUMBER(15,4),
    AMOUNT NUMBER(15,4),
    FILE_DATE DATE NOT NULL,
    primary key (ACCOUNT_NUMBER, SYMBOL, FILE_DATE)
); 
  • Transactions:
create or replace TABLE TRN (
    REP_CODE VARCHAR(10),
    FILE_DATE DATE NOT NULL,
    ACCOUNT_NUMBER VARCHAR(15) NOT NULL,
    CODE VARCHAR(10),
    CANCEL_STATUS_FLAG VARCHAR(1),
    SYMBOL VARCHAR(100),
    SECURITY_CODE VARCHAR(2),
    TRADE_DATE DATE,
    QUANTITY NUMBER(15,4),
    NET_AMOUNT NUMBER(15,4),
    PRINCIPAL NUMBER(15,4),
    BROKER_FEES NUMBER(15,4),
    OTHER_FEES NUMBER(15,4),
    SETTLE_DATE DATE,
    FROM_TO_ACCOUNT VARCHAR(30),
    ACCOUNT_TYPE VARCHAR(30),
    ACCRUED_INTEREST NUMBER(15,4),
    CLOSING_ACCOUNT_METHOD VARCHAR(30),
    DESCRIPTION VARCHAR(500)
); 
  • Prices:
create or replace TABLE PRI (
    SYMBOL VARCHAR(100) NOT NULL,
    SECURITY_TYPE VARCHAR(2) NOT NULL,
    FILE_DATE DATE NOT NULL,
    PRICE NUMBER(15,4) NOT NULL,
    FACTOR NUMBER(15,10),
    primary key (SYMBOL, FILE_DATE)
); 

These tables on their own are all effectively useless and messy, they almost always need to be joined with one another (or themselves) and have many additional calculations applied to them to be interpreted in any meaningful way. Views have helped me encapsulate this issue.

There are two core views that I use downstream from these tables:

  1. Holdings
SELECT 
    POS.FILE_DATE, 
    POS.ACCOUNT_NUMBER, 
    POS.SYMBOL,
    CASE WHEN POS.QUANTITY > 0 THEN POS.QUANTITY ELSE POS.AMOUNT END AS QUANTITY,
    CASE WHEN POS.SECURITY_TYPE IN ('FI', 'MB', 'UI') THEN
        COALESCE(
            PRI.FACTOR * PRI.PRICE * .01,
            PRI.PRICE * .01
        )
        ELSE PRI.PRICE END AS PPU,
    COALESCE(
        POS.AMOUNT,
        QUANTITY * PPU
    ) AS MARKET_VALUE
FROM POS AS POS 
LEFT JOIN PRI AS PRI 
    ON POS.FILE_DATE = PRI.FILE_DATE AND POS.SYMBOL = PRI.SYMBOL; 

  1. Cashflows (this one a is a doozy... our data provider really doesn't help much here)
select t.file_date, T.ACCOUNT_NUMBER,
    COALESCE (
        CASE WHEN T.SECURITY_CODE = 'MB' THEN INIT * p.factor * .01 ELSE NULL END, -- IF Factor and Par needed
        CASE WHEN T.SECURITY_CODE IN ('FI', 'UI') THEN INIT * .01 ELSE NULL END, -- if par val needed
        CASE WHEN T.QUANTITY > 0 AND P.PRICE > 0 THEN t.quantity * p.price ELSE NULL END,
        CASE WHEN T.NET_AMOUNT > 0 and p.price is not null THEN T.NET_AMOUNT * p.price ELSE NULL END,
        T.NET_AMOUNT, -- if the transaction has a net value
        BUYS.NET_AMOUNT, -- if there is a buy aggregate match for the day
        SELLS.NET_AMOUNT -- if there is a sell aggregate match for the day
    ) AS DERIVED, -- this records the initial cash flow value
    COALESCE( 
        CASE WHEN t.code IN ('DEP', 'REC') THEN DERIVED ELSE NULL END,
        CASE WHEN t.code IN ('WITH', 'DEL', 'FRTAX', 'EXABP') THEN -1 * DERIVED ELSE NULL END
    ) as DIRECTION, -- this determines if it was an inflow or outflow
    CASE 
        WHEN T.CANCEL_STATUS_FLAG = 'Y' THEN -1*DIRECTION 
        ELSE DIRECTION 
    END AS FLOW, -- this cancels out an existing transaction
    CASE WHEN T.CODE = 'MFEE' THEN INIT ELSE NULL END AS FEES,
    t.code, 
    t.symbol, 
    t.net_amount, 
    t.quantity, 
    p.price,
    p.factor
from trn t
LEFT JOIN PRI p 
    ON t.symbol = p.symbol 
    AND t.file_date = p.file_date
-- in the rare case that we dont have a securities price, it means that a buy/sell 
-- transaction occurred to remove the position from our 
-- data feed. This must mean that the transaction value 
-- is equivalent to the total internal operation that occurred to a particular security in 
-- this account on this day.
LEFT JOIN (
    select file_date, 
        account_number, 
        symbol, 
        SUM(net_amount) as net_amount 
    from TRN 
    where code = 'BUY' 
    group by file_date, account_number, symbol
) AS buys 
    ON t.code = 'DEL'   
    AND buys.file_date = t.file_date  
    AND buys.symbol = t.symbol  
    AND buys.account_number = t.account_number
    AND p.price IS NULL
    AND t.net_amount = 0
    AND buys.net_amount != 0
LEFT JOIN (
    select file_date, 
        account_number, 
        symbol, 
        SUM(net_amount) as net_amount 
    from TRN 
    where code = 'SELL' 
    group by file_date, account_number, symbol
) AS sells 
    ON t.code = 'REC' 
    AND t.file_date = sells.file_date 
    AND sells.symbol = t.symbol 
    AND sells.account_number = t.account_number
    AND p.price IS NULL
    AND t.net_amount = 0
    AND sells.net_amount != 0
WHERE 
    t.code in ('DEP', 'WITH', 'DEL', 'REC', 'FRTAX', 'MFEE', 'EXABP')
ORDER BY t.file_date; 

I also wrote views to group the two views above by their account number, named account_balances and grouped_cashflows, respectively. I call these two views often from my application layer, and have been satisfied with the execution speed thus far.

With all of that out of the way....

I am now attempting to compute the time weighted performance of each investment account. Id prefer to do this using SQL instead of in the application layer so that I can view the output in those sweet sweet Snowflake Dashboards.

The formula I am using is known as enter image description here

Thank you!

CodePudding user response:

So far I only see tiny things that I don't think will stack up to anything large.

From holdings:

    CASE WHEN POS.SECURITY_TYPE IN ('FI', 'MB', 'UI') THEN
        COALESCE(
            PRI.FACTOR * PRI.PRICE * .01,
            PRI.PRICE * .01
        )
        ELSE PRI.PRICE END AS PPU,

the a two legged CASE in snowflake is the same as using IFF and IFF is a little easier to read, imho. And the math can be tweaked.

    IFF(POS.SECURITY_TYPE IN ('FI', 'MB', 'UI'),
        PRI.PRICE * .01 * COALESCE(PRI.FACTOR, 1),
        PRI.PRICE) AS PPU,

is cashflow, the large COALESCE for derived, could become a CASE statement, but perhaps that would not be faster:

thus:

    COALESCE (
        IFF( T.SECURITY_CODE = 'MB', INIT * p.factor * .01, NULL), -- IF Factor and Par needed
        IFF( T.SECURITY_CODE IN ('FI', 'UI'), INIT * .01, NULL), -- if par val needed
        IFF( T.QUANTITY > 0 AND P.PRICE > 0, t.quantity * p.price, NULL),
        IFF( T.NET_AMOUNT > 0 and p.price is not null, T.NET_AMOUNT * p.price, NULL),
        T.NET_AMOUNT, -- if the transaction has a net value
        BUYS.NET_AMOUNT, -- if there is a buy aggregate match for the day
        SELLS.NET_AMOUNT -- if there is a sell aggregate match for the day
    ) AS DERIVED, -- this records the initial cash flow value

could be

    CASE 
        WHEN T.SECURITY_CODE = 'MB' THEN INIT * p.factor * .01
        WHEN T.SECURITY_CODE IN ('FI', 'UI') THEN INIT * .01
        WHEN T.QUANTITY > 0 AND P.PRICE > 0 THEN t.quantity * p.price
        WHEN T.NET_AMOUNT > 0 and p.price is not null THEN T.NET_AMOUNT * p.price
        ELSE COALESCE(
            T.NET_AMOUNT, -- if the transaction has a net value
            BUYS.NET_AMOUNT, -- if there is a buy aggregate match for the day
            SELLS.NET_AMOUNT -- if there is a sell aggregate match for the day
        ) 
    END AS DERIVED, -- this records the initial cash flow value

Hmm this might be somethings.

In cashflow, you make buys and sells and you only left join those aggregations, if the t.net_amount = 0 BUT only use those values in:

        ELSE COALESCE(
            T.NET_AMOUNT, -- if the transaction has a net value
            BUYS.NET_AMOUNT, -- if there is a buy aggregate match for the day
            SELLS.NET_AMOUNT -- if there is a sell aggregate match for the day
        ) 

COALESCE will only use those values if t.net_amount is null. But those values will only be present if t.net_amount is zero so buys and sells is 100% wasted compute. So ether the join should be t.net_amount is null or those can be dropped.

Then there is things like

CASE WHEN T.CODE = 'MFEE' THEN INIT ELSE NULL END AS FEES

and later that is coalesced to zero if null, (which might handle the left joins also). but it could just be zero here. But also it points that T.CODE can equal 'MFEE' and DIRECTION does not handle this, so direction can be null, thus FLOW can be null.

  • Related