In Postgresql, suppose I have a 2-col table containing 3 records, namely tA
id | col_A | col_B | type
1 | 1 | 2 | A
2 | 3 | 4 | B
3 | 3 | 1 | A
and a table that contains column weights for each type, named tB
type | col | weight
A | col_A | 0.6
A | col_B | 0.4
B | col_A | 0.4
B | col_B | 0.6
then I want to have a new table that has the same size as tA
, but cols are weighted with tB
. For example, where id=1
, due to type A
, hence col_A*0.6
, col_B*0.4
. The expected result is below
id | col_A | col_B | type
1 | 0.6 | 0.8 | A
2 | 1.8 | 1.6 | B
3 | 1.2 | 0.4 | A
Currently, I wrote a SQL function to handle but I wonder if is there another way that can be done in SQL query?
CodePudding user response:
For a fixed list of column names, you can join
, then do conditional aggregation to weight to each value:
select a.id,
a.col_a * max(b.weight) filter(where b.col = 'colA') col_a,
a.col_b * max(b.weight) filter(where b.col = 'colB') col_b,
a.type
from ta a
inner join tb b using (type)
group by a.id
This would filter out rows on ta
whose types do no exist in tb
. If that’s a concern, you can use left join
instead.
CodePudding user response:
A pivot table would be perfect for this considering how there may be more than two types in the future, possibly dozens? For this you want the tablefunc
extension that comes with Postgres as there is no native support for it like can be found in MS SQL Server or Oracle.
CREATE EXTENSION IF NOT EXISTS tablefunc;
This makes the crosstab
pivot table function available in your database.
With the assumption that your weights are numeric
types that do not suffer from IEEE floating point accuracy issues:
SELECT tA.id
, tA.col_A * ct.weight_A col_A
, tA.col_B * ct.weight_B col_B
, tA.type
FROM tA
JOIN crosstab('SELECT type, col, weight FROM tB ORDER BY type, col')
AS ct(type text, weight_A numeric, weight_B numeric, weight_C numeric)
USING (type)
ORDER BY id
;
Change to double precision
or real
as wanted/needed. See that SELECT
statement in the crosstab
function? This basically says, the first column returned is your row identifier, your grouping. The second returned column are the names of the columns you want returned. The last is the value of the previous column. So ct
becomes a virtual table that changes your tB
table looking like this:
type | col | weight |
---|---|---|
A | col_A | 0.6 |
A | col_B | 0.4 |
B | col_A | 0.4 |
B | col_B | 0.6 |
to look like this instead:
type | weight_A | weight_B | weight_C |
---|---|---|---|
A | 0.6 | 0.4 | [NULL] |
B | 0.4 | 0.6 | [NULL] |
Note how weight_C
never exists in the tB
values? That's okay and useful. It means with a minor change to your query, you can add and remove columns to reflect the changing state of tB
. You can add weight_D
and weight_E
return columns, for example. If they are not present in tB
, they will remain NULL, but if you add the appropriate weights to tB
, they will fill in automatically!
This computed table is what you join against to get your final answer.