Home > Mobile >  calculating percentage of two columns in SQL - using BigQuery
calculating percentage of two columns in SQL - using BigQuery

Time:06-08

I am trying to calculate the percentage of crime status(clearance_status) over total crimes per category (primary_type)and I just couldn't make the output right.

here is my initial query:

SELECT primary_type, 
       clearance_status, 
       round(count(clearance_status)/(SELECT count(primary_type) from `bigquery-publicdata.austin_crime.crime`)*100,2) as percentage
FROM `bigquery-public-data.austin_crime.crime`
WHERE primary_type = "Theft: BOV"
GROUP BY primary_type, clearance_status

result:

Row primary_type    clearance_status     percentage 
1   Theft: BOV.     Not cleared          8.59
2   Theft: BOV.     null.                0.0
3   Theft: BOV      Cleared by Arrest.   0.21
4   Theft: BOV      Cleared by Exception 0.03

But looking at the result of COUNT(primary_type) and COUNT(clerance_status)

SELECT primary_type, count(primary_type) as count
FROM `bigquery-public-data.austin_crime.crime`
WHERE primary_type = "Theft: BOV"
GROUP BY primary_type

result:

Row primary_type    count   
1   Theft: BOV.     10545

SELECT clearance_status, count(clearance_status) as count_1
FROM `bigquery-public-data.austin_crime.crime`
WHERE primary_type = "Theft: BOV"
GROUP BY clearance_status

result:

Row clearance_status     count_1    
1   Not cleared.         10028
2   null                 0
3   Cleared by Arrest.   242
4   Cleared by Exception 30

Manually, I should get 95% for "Not cleared" status but my initial query yields 8.59 only. using over() gives the same output.

CodePudding user response:

SELECT
  primary_type,
  clearance_status,
  COUNT(*) AS clearance_status_count,
  (SUM(COUNT(*)) OVER(PARTITION BY primary_type)) AS total,
  COUNT(*) / (SUM(COUNT(*)) OVER(PARTITION BY primary_type)) AS ratio
FROM
  `bigquery-public-data.austin_crime.crime`
GROUP BY
  primary_type,
  clearance_status
ORDER BY
  primary_type,
  clearance_status;

This query works across the different primary_types. I included the count and the total to show that the ratio is correct.

Similar question here: Calculate percentage of group using GROUP BY

  •  Tags:  
  • sql
  • Related