I'm trying to perform a query that returns an aggregation of values from the same table with information from others through a foreign key, but I can't. In the example below, I wanted to return the total sales by state on 2020-01-01 and 2021-01-01, showing the name of the state.
Tables script:
CREATE TABLE IF NOT EXISTS estado (
id SERIAL PRIMARY KEY,
estado VARCHAR(100)
)
CREATE TABLE IF NOT EXISTS municipio (
id SERIAL PRIMARY KEY,
estado integer REFERENCES estado(id),
municipio VARCHAR(100)
)
CREATE TABLE IF NOT EXISTS vendas (
id SERIAL PRIMARY KEY,
municipio integer REFERENCES municipio(id),
valor numeric,
data_venda date
)
INSERT INTO estado VALUES (1, 'PR');
INSERT INTO estado VALUES (2, 'SC');
INSERT INTO estado VALUES (3, 'RS');
INSERT INTO municipio VALUES (1, 1, 'Pelotas');
INSERT INTO municipio VALUES (2, 1, 'Caxias do Sul');
INSERT INTO municipio VALUES (3, 1, 'Porto Alegre');
INSERT INTO municipio VALUES (4, 2, 'Florianopolis');
INSERT INTO municipio VALUES (5, 2, 'Chapeco');
INSERT INTO municipio VALUES (6, 2, 'Itajai');
INSERT INTO municipio VALUES (7, 3, 'Curitiba');
INSERT INTO municipio VALUES (8, 3, 'Maringa');
INSERT INTO municipio VALUES (9, 3, 'Foz do Iguaçu');
INSERT INTO vendas VALUES (1, 6, 5, '2020-01-01');
INSERT INTO vendas VALUES (2, 5, 10, '2021-01-01');
INSERT INTO vendas VALUES (3, 5, 5, '2020-01-01');
INSERT INTO vendas VALUES (4, 4, 2, '2020-01-01');
INSERT INTO vendas VALUES (5, 3, 10, '2021-01-01');
INSERT INTO vendas VALUES (6, 3, 12, '2020-01-01');
INSERT INTO vendas VALUES (7, 3, 20, '2020-01-01');
INSERT INTO vendas VALUES (8, 2, 10, '2020-01-01');
INSERT INTO vendas VALUES (9, 1, 11, '2021-01-01');
INSERT INTO vendas VALUES (10, 9, 4, '2020-01-01');
My attempt (absurd values and the RS ones do not appear):
SELECT
e.estado, SUM(v.valor) as sum2021, SUM(v2.valor) as sum2020
FROM vendas v
CROSS JOIN vendas v2
INNER JOIN municipio m ON v.municipio = m.id
INNER JOIN estado e ON m.estado = e.id
WHERE v.data_venda = '2021-01-01'
AND v2.data_venda = '2020-01-01'
GROUP BY 1;
Translating some terms:
município = city
estado = state
vendas = sales
valor = value
data_venda = date of sale
CodePudding user response:
You're cross joining vendas
with itself (as v1
and v2
), meaning that each row from it will be matched with each other row (i.e., a Cartesian product), which creates the unexpected results you're seeing.
The good news is that you don't need this join. You can use an aggregate function (sum
in this case) on a subset of the rows from the query using the filter
clause:
SELECT
e.estado,
SUM(v.valor) FILTER (WHERE data_venda = '2021-01-01') AS sum2021,
SUM(v.valor) FILTER (WHERE data_venda = '2020-01-01') AS sum2020
FROM vendas v
INNER JOIN municipio m ON v.municipio = m.id
INNER JOIN estado e ON m.estado = e.id
GROUP BY
e.estado;