I am trying to calculate the percentage of crime status(clearance_status) over total crimes per category (primary_type)and I just couldn't make the output right.
here is my initial query:
SELECT primary_type,
clearance_status,
round(count(clearance_status)/(SELECT count(primary_type) from `bigquery-publicdata.austin_crime.crime`)*100,2) as percentage
FROM `bigquery-public-data.austin_crime.crime`
WHERE primary_type = "Theft: BOV"
GROUP BY primary_type, clearance_status
result:
Row primary_type clearance_status percentage
1 Theft: BOV. Not cleared 8.59
2 Theft: BOV. null. 0.0
3 Theft: BOV Cleared by Arrest. 0.21
4 Theft: BOV Cleared by Exception 0.03
But looking at the result of COUNT(primary_type) and COUNT(clerance_status)
SELECT primary_type, count(primary_type) as count
FROM `bigquery-public-data.austin_crime.crime`
WHERE primary_type = "Theft: BOV"
GROUP BY primary_type
result:
Row primary_type count
1 Theft: BOV. 10545
SELECT clearance_status, count(clearance_status) as count_1
FROM `bigquery-public-data.austin_crime.crime`
WHERE primary_type = "Theft: BOV"
GROUP BY clearance_status
result:
Row clearance_status count_1
1 Not cleared. 10028
2 null 0
3 Cleared by Arrest. 242
4 Cleared by Exception 30
Manually, I should get 95% for "Not cleared" status but my initial query yields 8.59 only. using over() gives the same output.
CodePudding user response:
SELECT
primary_type,
clearance_status,
COUNT(*) AS clearance_status_count,
(SUM(COUNT(*)) OVER(PARTITION BY primary_type)) AS total,
COUNT(*) / (SUM(COUNT(*)) OVER(PARTITION BY primary_type)) AS ratio
FROM
`bigquery-public-data.austin_crime.crime`
GROUP BY
primary_type,
clearance_status
ORDER BY
primary_type,
clearance_status;
This query works across the different primary_types
. I included the count and the total to show that the ratio is correct.
Similar question here: Calculate percentage of group using GROUP BY