I am very new to SQL (using it in R currently with RSQLite and DBI packages)
I am trying to create a column that is the absolute mean deviation, aka:
(i) - AVG(i,g)
Where i is the individual occurence and the AVG component is the average for the group. What I am having troubles with is making sure the AVG component is the only part that gets grouped. When I do GROUP BY, it groups everything and doesn't give me the right number.
Here is the sample data:
student | class | grade |
---|---|---|
A | English | 79 |
A | Spanish | 65 |
A | Chemistry | 92 |
B | English | 46 |
B | Spanish | 83 |
B | Chemistry | 78 |
C | English | 67 |
C | Spanish | 87 |
C | Chemistry | 98 |
D | English | 99 |
D | Spanish | 80 |
D | Chemistry | 75 |
Basically I would want the individual GRADE for a student in a class to compare with the average of that student (e.g. the individual english grade - the total average for a student)
Iha ve tried the following:
dbGetQuery(gradesdb, "SELECT student,
ABS(grade-AVG(grade)) AS mad
FROM grades
GROUP BY student,class")
This gives me 0 for all of the stat values (which I gather is because the group by is going on all selected operations within the query)
how can I make it so that the AVG portion of the calculation is only "grouped" by the student. I get the right calculation if I do:
dbGetQuery(gradesdb2, "SELECT student,
ABS(grade-AVG(grade)) AS mad
FROM grades
GROUP BY student")
But then I only get the first class for each student, instead of the stat for each class with the student average.
I want to do this all in SQL and not calculate the average as a seperate column with base R or tidyverse.
Thank you so much for your help!
CodePudding user response:
Consider turning AVG()
via GROUPBY
to AVG()
via a window function:
SELECT student,
ABS(grade - AVG(grade) OVER (PARTITION BY student)) AS mad
FROM grades