Home > Mobile >  Trying to display Using Sum() with JOIN doesn't provide correct output
Trying to display Using Sum() with JOIN doesn't provide correct output

Time:12-27

I'm trying to create a query that displays a user's Id, the sum of total steps, and sum of total calories burnt.

The data for steps and calories are within two datasets, so I used JOIN. However, when I write out the query, the joined data does not look correct. However when I do them separately, it appears to show the correct data

Below are my queries...I am fairly new to SQL, so I am somewhat confused on what I did wrong. How do I correct this? Thank you in advanced for the help!

For the Steps table, "Id" and "StepTotal" are Integers. For the Calories table, "Id" and "Calories" are also Integers.

SELECT steps.Id,Sum(StepTotal) AS Total_steps,Sum(cal.Calories) as Total_calories
FROM fitbit.Daily_steps AS steps
JOIN fitbit.Daily_calories AS cal ON steps.Id=cal.Id
GROUP BY Id

Given Output(Picture)

Expected Output(Picture)

For Steps

SELECT Id,Sum(StepTotal) AS Total_steps
FROM fitbit.Daily_steps
group by Id

Id Total_steps
1503960366 375619
1624580081 178061
1644430081 218489

For Calories

SELECT Id,Sum(Calories) AS Total_calories
 FROM fitbit.Daily_calories
group by Id
Id Total_calories
1503960366 56309
1624580081 45984
1644430081 84339

CodePudding user response:

I believe your current solution is returning additional rows as the result of the JOIN.

Let's look at an example data set

Steps
id   | total
a    |   5
a    |   7
b    |   3

Calories
id   | total
a    |   100
a    |   300
b    |   400

Now, if we SELECT * FROM Calories, we'd get 3 rows. If we SELECT * FROM Calories GROUP BY id, we'd get two rows.

But if we use a JOIN:

SELECT Steps.id, Steps.total AS steps, Calories.total AS cals FROM Steps
JOIN Calories
   ON Steps.id = Calories.id
WHERE id = 'a'

This would return the following:

Steps_Calories
id | steps | cals
a  |   5   | 100
a  |   5   | 300
a  |   7   | 100
a  |   7   | 300

So now if we GROUP BY & SUM(steps), we get 24, instead of the expected 12, because the JOIN returns each pairing of steps & calories.

To mitigate this, we can use sub-queries & group & sum within the sub-queries

SELECT Steps.id, Steps.total AS steps, Calories.total AS cals 
    FROM (SELECT id, SUM(total) FROM Steps GROUP BY id) as step_totals
    JOIN (Select id, SUM(total) FROM Cals GROUP BY id) as cal_totals
    JOIN Calories
        ON cal_totals.id = step_totals.id

Now each subquery only returns a single row for each id, so the join only returns a single row as well.

Of course, you'll have to adapt this for your schema.

  • Related