Home > Software engineering >  How to weight cols for each records using SQL only
How to weight cols for each records using SQL only

Time:01-03

In Postgresql, suppose I have a 2-col table containing 3 records, namely tA

id | col_A | col_B | type
1  | 1     | 2     | A  
2  | 3     | 4     | B
3  | 3     | 1     | A

and a table that contains column weights for each type, named tB

type | col   | weight
A    | col_A | 0.6
A    | col_B | 0.4
B    | col_A | 0.4
B    | col_B | 0.6

then I want to have a new table that has the same size as tA, but cols are weighted with tB. For example, where id=1, due to type A, hence col_A*0.6, col_B*0.4. The expected result is below

id | col_A | col_B | type
1  | 0.6   | 0.8   | A  
2  | 1.8   | 1.6   | B
3  | 1.2   | 0.4   | A

Currently, I wrote a SQL function to handle but I wonder if is there another way that can be done in SQL query?

CodePudding user response:

For a fixed list of column names, you can join, then do conditional aggregation to weight to each value:

select a.id, 
    a.col_a * max(b.weight) filter(where b.col = 'colA') col_a,
    a.col_b * max(b.weight) filter(where b.col = 'colB') col_b,
    a.type
from ta a
inner join tb b using (type)
group by a.id

This would filter out rows on ta whose types do no exist in tb. If that’s a concern, you can use left join instead.

CodePudding user response:

A pivot table would be perfect for this considering how there may be more than two types in the future, possibly dozens? For this you want the tablefunc extension that comes with Postgres as there is no native support for it like can be found in MS SQL Server or Oracle.

CREATE EXTENSION IF NOT EXISTS tablefunc;

This makes the crosstab pivot table function available in your database.

With the assumption that your weights are numeric types that do not suffer from IEEE floating point accuracy issues:

SELECT tA.id
     , tA.col_A * ct.weight_A col_A
     , tA.col_B * ct.weight_B col_B
     , tA.type
  FROM tA
  JOIN crosstab('SELECT type, col, weight FROM tB ORDER BY type, col')
       AS ct(type text, weight_A numeric, weight_B numeric, weight_C numeric)
       USING (type)
 ORDER BY id
;

Change to double precision or real as wanted/needed. See that SELECT statement in the crosstab function? This basically says, the first column returned is your row identifier, your grouping. The second returned column are the names of the columns you want returned. The last is the value of the previous column. So ct becomes a virtual table that changes your tB table looking like this:

type col weight
A col_A 0.6
A col_B 0.4
B col_A 0.4
B col_B 0.6

to look like this instead:

type weight_A weight_B weight_C
A 0.6 0.4 [NULL]
B 0.4 0.6 [NULL]

Note how weight_C never exists in the tB values? That's okay and useful. It means with a minor change to your query, you can add and remove columns to reflect the changing state of tB. You can add weight_D and weight_E return columns, for example. If they are not present in tB, they will remain NULL, but if you add the appropriate weights to tB, they will fill in automatically!

This computed table is what you join against to get your final answer.

  • Related