Home > Software engineering >  Unexplained extra rows after MySQL join
Unexplained extra rows after MySQL join

Time:07-13

I've stripped this down as much as possible, so I hope it's easy to understand. Currently, I have two tables with website data, display and traffic. Both tables have the columns date and site.

display has revenue,
traffic has pageviews.

My end goal is to produce results with revenue and pageviews for each of our sites, grouped by week. So, four columns: week, site, pageviews, revenue - where week is week(date) AS week.

Here is my best attempt so far:

SELECT display.site,
       Week(display.date) AS dateweek,
       Sum(revenue)       AS revenue,
       tr.pageviews
FROM   display
       INNER JOIN (SELECT traffic.site,
                          Week(date)     AS date_granularity,
                          Sum(pageviews) AS pageviews
                   FROM   traffic
                   GROUP  BY traffic.site,
                             date_granularity) AS tr
               ON tr.date_granularity = Week(display.date)
WHERE  display.site = 'Google'
GROUP  BY dateweek,
          display.site,
          tr.pageviews 

(Google is in place of the actual site name of our client)

I will take week 17 from the results grid as an example:

 -------- ---------- -------------------- ----------- 
|  site  | dateweek |      revenue       | pageviews |
 -------- ---------- -------------------- ----------- 
| Google |       17 | 1424.0631110000002 |   1151716 |
| Google |       17 | 1424.0631110000002 |   1169364 |
| Google |       17 | 1424.0631110000002 |   1427273 |
| Google |       17 | 1424.0631110000002 |   6776535 |
 -------- ---------- -------------------- ----------- 

Doing queries from the individual tables with no joins, this is the correct and expected output for week 17:

 -------- ---------- -------------------- ----------- 
|  site  | dateweek |      revenue       | pageviews |
 -------- ---------- -------------------- ----------- 
| Google |       17 | 1424.0631110000002 |   1169364 |
 -------- ---------- -------------------- ----------- 

It feels like I'm so close, as the row I expect I'm already getting, but there's some random extras. Any ideas? If there's anything that's unclear I'm happy to add more detail.

CodePudding user response:

If you only put one ON condition, the JOIN will match the condition only. In your case, you're matching just the WEEK so that means regardless of any site from the subquery, as long as the WEEK value match, it will return result. The fix is very simple, you just need to add 1 more condition in ON, so:

SELECT display.site,
       Week(display.date) AS dateweek,
       Sum(revenue)       AS revenue,
       tr.pageviews
FROM   display
       INNER JOIN (SELECT traffic.site,
                          Week(date)     AS date_granularity,
                          Sum(pageviews) AS pageviews
                   FROM   traffic
                   GROUP  BY traffic.site,
                             date_granularity) AS tr
               ON tr.date_granularity = Week(display.date)
              /*add another condition here*/ 
               AND tr.site=display.site
WHERE  display.site = 'Google'
GROUP  BY dateweek,
          display.site,
          tr.pageviews 

Demo fiddle

  • Related