Seeking suggestions for how to do the following in SQL (I have an approach but it seems very inefficient, described below). I have a ‘transactions’ table that has the following general structure – each row pertains to a particular ‘order ID’ and ‘transaction type’, with columns of information about the transaction:
TYPE | ORDER_NO | revenue | shipping | rebates |
---|---|---|---|---|
SALE | 100001 | $55.22 | $7.00 | $(5.22) |
REFUND | 100001 | $(55.22) | $0 | $5.22 |
ADJUSTMENT | 100001 | $(8.00) | $(2.00) | $0 |
SALE | 100005 | $40.00 | $4.00 | $0 |
REFUND | 100005 | $(32.00) | $(4.00) | $0 |
I need to categorize the values in columns for accounting purposes, but each column has separate categorization logic depending on the value of TYPE for the row. In the extreme case where all values in 'order # 10001' above ended up in separate categories, the desired result would look like this:
TYPE | ORDER_NO | revenue | shipping | rebates | accounting_code |
---|---|---|---|---|---|
SALE | 100001 | $55.22 | 101 | ||
SALE | 100001 | $7.00 | 102 | ||
SALE | 100001 | $(5.22) | 103 | ||
REFUND | 100001 | $(55.22) | 104 | ||
REFUND | 100001 | $5.22 | 105 | ||
ADJUSTMENT | 100001 | $(8.00) | 106 | ||
ADJUSTMENT | 100001 | $(2.00) | 107 |
My approach has been to pull the transactions table as a temp table, and then have a series of sub-queries each pulling only one of the columns (and 0 for the others) and assigning the category based on TYPE and whichever column is being pulled in that sub query. I then union all the results back together (sample query below). But this seems cumbersome (as verified by performance). Thanks in advance for any thoughts on more efficient ways to approach this.
Current query (a concatenation of TYPE and the hard-coded column name stands in for accounting code here - e.g. 'SALES | revenue'):
with TT as
(select * from transactions)
select
TYPE,
ORDER_no,
sum(revenue) as revenue,
0 as shipping,
0 as rebates,
CONCAT (type,' | ', 'revenue') as acc_code
FROM TT
group by
TYPE,
Order_no,
CONCAT (type,' | ', 'revenue')
UNION
…(repeat process for next column, setting others to 0)
CodePudding user response:
I think I recommend considering something like the following, at least as a starting point:
WITH TT AS (
SELECT
type,
order_no,
sum(revenue) as revenue,
sum(shipping) as shipping,
sum(rebates) as rebates
FROM
transactions
GROUP BY
type, order_no
)
SELECT type, order_no, revenue, 0 shipping, 0 rebates, CONCAT (type,' | ', 'revenue') acc_code
FROM TT
UNION ALL
SELECT type, order_no, 0 revenue, shipping, 0 rebates, CONCAT (type,' | ', 'revenue') acc_code
FROM TT
UNION ALL
SELECT type, order_no, 0 revenue, 0 shipping, rebates, CONCAT (type,' | ', 'revenue') acc_code
FROM TT
The CTE TT
will do the group by
once, likely improving performance (I'm skeptical your current CTE is helping performance-wise). If performance is still a concern (as per your comments), an index on (type, order_no)
might help optimize the group by
. Besides performance, this version of the CTE reduces the complexity of each of the main queries that aren't union
ed together (e.g. the group by
isn't repeated) making it easier to read and maintain.
Note that this handles the acc_code
like your sample query does (as opposed to your expected output). You could adjust it if necessary, but that seems tangential to your question.
It's a little unclear what data types the revenue/shipping/rebates are. I'm assuming it's safe to sum
them as numerics, since your existing query sum
s them and it's implied that your existing query works.