Home > other >  Using SQL to categorize multiple columns in a row, each with their own categorization logic
Using SQL to categorize multiple columns in a row, each with their own categorization logic

Time:01-26

Seeking suggestions for how to do the following in SQL (I have an approach but it seems very inefficient, described below). I have a ‘transactions’ table that has the following general structure – each row pertains to a particular ‘order ID’ and ‘transaction type’, with columns of information about the transaction:

TYPE ORDER_NO revenue shipping rebates
SALE 100001 $55.22 $7.00 $(5.22)
REFUND 100001 $(55.22) $0 $5.22
ADJUSTMENT 100001 $(8.00) $(2.00) $0
SALE 100005 $40.00 $4.00 $0
REFUND 100005 $(32.00) $(4.00) $0

I need to categorize the values in columns for accounting purposes, but each column has separate categorization logic depending on the value of TYPE for the row. In the extreme case where all values in 'order # 10001' above ended up in separate categories, the desired result would look like this:

TYPE ORDER_NO revenue shipping rebates accounting_code
SALE 100001 $55.22 101
SALE 100001 $7.00 102
SALE 100001 $(5.22) 103
REFUND 100001 $(55.22) 104
REFUND 100001 $5.22 105
ADJUSTMENT 100001 $(8.00) 106
ADJUSTMENT 100001 $(2.00) 107

My approach has been to pull the transactions table as a temp table, and then have a series of sub-queries each pulling only one of the columns (and 0 for the others) and assigning the category based on TYPE and whichever column is being pulled in that sub query. I then union all the results back together (sample query below). But this seems cumbersome (as verified by performance). Thanks in advance for any thoughts on more efficient ways to approach this.

Current query (a concatenation of TYPE and the hard-coded column name stands in for accounting code here - e.g. 'SALES | revenue'):

with TT as
   (select * from transactions)
select 
   TYPE,
   ORDER_no,
   sum(revenue) as revenue,
   0 as shipping,
   0 as rebates,
   CONCAT (type,' | ', 'revenue') as acc_code

FROM TT

group by 
   TYPE,
   Order_no,
   CONCAT (type,' | ', 'revenue') 

UNION

…(repeat process for next column, setting others to 0)

CodePudding user response:

I think I recommend considering something like the following, at least as a starting point:

WITH TT AS (
 SELECT
  type,
  order_no,
  sum(revenue) as revenue,
  sum(shipping) as shipping,
  sum(rebates) as rebates
 FROM 
  transactions
 GROUP BY
  type, order_no
)
SELECT type, order_no, revenue, 0 shipping, 0 rebates, CONCAT (type,' | ', 'revenue') acc_code
FROM TT
UNION ALL
SELECT type, order_no, 0 revenue, shipping, 0 rebates, CONCAT (type,' | ', 'revenue') acc_code
FROM TT
UNION ALL
SELECT type, order_no, 0 revenue, 0 shipping, rebates, CONCAT (type,' | ', 'revenue') acc_code
FROM TT

The CTE TT will do the group by once, likely improving performance (I'm skeptical your current CTE is helping performance-wise). If performance is still a concern (as per your comments), an index on (type, order_no) might help optimize the group by. Besides performance, this version of the CTE reduces the complexity of each of the main queries that aren't unioned together (e.g. the group by isn't repeated) making it easier to read and maintain.

Note that this handles the acc_code like your sample query does (as opposed to your expected output). You could adjust it if necessary, but that seems tangential to your question.

It's a little unclear what data types the revenue/shipping/rebates are. I'm assuming it's safe to sum them as numerics, since your existing query sums them and it's implied that your existing query works.

  • Related