Home > Software design >  Join Tables with different data dimensions (weekly and daily)
Join Tables with different data dimensions (weekly and daily)

Time:05-06

I have a table with weekly data acquisitions and a second table with daily data acquisitions. I need to make a union between the tables (INNER JOIN) through the id key and for the daily table values, aggregate them in an average per week.

So, my resulting table will also be with weekly dates, and for each week it should present the average value of the id in the previous 7 days (including the day of acquisition).

Data samples: DBfiddle

In this example provided, the join and group operation are returning the correct results. But he is taking the global average, not the average corresponding to the previous 7 days for each weekly acquisition date.

Does anyone have an idea of the best way to solve this?

CodePudding user response:

Subquery will be enough, but only if you are interested exactly in last 7 days for each row

select id, rating, aquisition_date , 
(select avg(value) from table2 a2 where a2.id = a1.id and 
a2.aquisition_date between a1.aquisition_date - 6 and a1.aquisition_date) avg
from table1 a1

Results

id rating aquisition_date avg
1 10 2022-04-18 5
1 8.2 2022-04-25 14.57142

In attached example you also provide time values (but due to table definition they are cuted off during insert)

CodePudding user response:

Short Answer: You have to include the week into the JOIN:

SELECT 
    table1.id
    ,EXTRACT(WEEK FROM table1.aquisition_date) AS week
    ,table1.rating AS table1_rating
    ,AVG(table2.value) as table2_value_avg
FROM 
    table1 table1
INNER JOIN 
    table2 table2
ON
    table2.id =   table1.id AND
    EXTRACT(WEEK FROM table1.aquisition_date) = EXTRACT(WEEK FROM table2.aquisition_date)
GROUP BY
    table1.id
   ,table1.rating
   ,table1.aquisition_date
;

This gives us

id  week    table1_rating   table2_value_avg
1   17      8.2             20.0000000000000000
1   16      10              13.1428571428571429

Long Answer: How do we get there? Consider the inner join without the aggregation:

Your first attempt is

SELECT 
    table1.id
    ,EXTRACT(WEEK FROM table1.aquisition_date) AS week1
    ,EXTRACT(WEEK FROM table2.aquisition_date) AS week2
--    ,AVG(EXTRACT(WEEK FROM table2.aquisition_date)) AS week2
    ,table1.rating AS table1_rating
    ,table2.value AS table2_value
--    ,AVG(table2.value) as table2_value_avg
FROM 
    table1 table1
INNER JOIN 
    table2 table2
ON
    table2.id =   table1.id

This will match rows from both tables which do not belong together, e.g values from week 15 with ratings from week 16.

id  week1   week2   table1_rating   table2_value
1   16      15      10              2
1   16      15      10              4
[... 40 more rows]

Actually it matches every row from table1 with table2. That's why the average is the global average, even if the GROUP BY works.

So the solution from above is removing those rows from the inner join where the weeks do not match.

N.B.: If you want to include data in your results where only one of the tables has data you need a FULL OUTER JOIN but the idea is the same.

N.B.: (Thanks @shawnt00) (EXTRACT WEEK FROM ...) means ISO 8601 week. It starts on Monday and there might be other definitions of a week.

N.B.: (Thanks @shawnt00) The solution ignores the year. If data from more than one year is present, you need to match the year and week in the JOIN.

  • Related