Home > Software engineering >  summarize() equivalent in SQL (creating a summary table in SQL)
summarize() equivalent in SQL (creating a summary table in SQL)

Time:09-22

I wish to create a summary table. For example, the dataset df looks like this:

type   category   color
bmw    a          black
bmw    b          black
bmw    c          white 
bmw    d          black 
bmw    e          white 
.....

Basically, type will be the same value throughout the whole dataset, and category will be all unique values. What I want to do is to figure out the proportion of the colors. So I hope to get an output that looks something like this:

color   count   prop
black   48      0.34 
white   62      0.52
grey    20      ...
.....

Please note that the numbers and proportions above are all made up, not proportional, just for illustration purposes.

In R, all I have to do is use dplyr and use the summarise function to output the desired result, but not sure how to do it in SQL.

I've tried doing the following code:

SELECT type, category, color, count(*) 
FROM df 

However it gave me an error.

CodePudding user response:

You can use aggregation and window functions:

select color, count(*),
       count(*) * 1.0 / sum(count(*)) over ()
from df
group by color;
  • Related