I'm trying to create a query that displays a user's Id, the sum of total steps, and sum of total calories burnt.
The data for steps and calories are within two datasets, so I used JOIN
. However, when I write out the query, the joined data does not look correct. However when I do them separately, it appears to show the correct data
Below are my queries...I am fairly new to SQL, so I am somewhat confused on what I did wrong. How do I correct this? Thank you in advanced for the help!
For the Steps table, "Id" and "StepTotal" are Integers. For the Calories table, "Id" and "Calories" are also Integers.
SELECT steps.Id,Sum(StepTotal) AS Total_steps,Sum(cal.Calories) as Total_calories
FROM fitbit.Daily_steps AS steps
JOIN fitbit.Daily_calories AS cal ON steps.Id=cal.Id
GROUP BY Id
For Steps
SELECT Id,Sum(StepTotal) AS Total_steps
FROM fitbit.Daily_steps
group by Id
Id | Total_steps |
---|---|
1503960366 | 375619 |
1624580081 | 178061 |
1644430081 | 218489 |
For Calories
SELECT Id,Sum(Calories) AS Total_calories
FROM fitbit.Daily_calories
group by Id
Id | Total_calories |
---|---|
1503960366 | 56309 |
1624580081 | 45984 |
1644430081 | 84339 |
CodePudding user response:
I believe your current solution is returning additional rows as the result of the JOIN.
Let's look at an example data set
Steps
id | total
a | 5
a | 7
b | 3
Calories
id | total
a | 100
a | 300
b | 400
Now, if we SELECT * FROM Calories
, we'd get 3 rows. If we SELECT * FROM Calories GROUP BY id
, we'd get two rows.
But if we use a JOIN:
SELECT Steps.id, Steps.total AS steps, Calories.total AS cals FROM Steps
JOIN Calories
ON Steps.id = Calories.id
WHERE id = 'a'
This would return the following:
Steps_Calories
id | steps | cals
a | 5 | 100
a | 5 | 300
a | 7 | 100
a | 7 | 300
So now if we GROUP BY
& SUM(steps)
, we get 24, instead of the expected 12, because the JOIN
returns each pairing of steps & calories.
To mitigate this, we can use sub-queries & group & sum within the sub-queries
SELECT Steps.id, Steps.total AS steps, Calories.total AS cals
FROM (SELECT id, SUM(total) FROM Steps GROUP BY id) as step_totals
JOIN (Select id, SUM(total) FROM Cals GROUP BY id) as cal_totals
JOIN Calories
ON cal_totals.id = step_totals.id
Now each subquery only returns a single row for each id, so the join only returns a single row as well.
Of course, you'll have to adapt this for your schema.