I was a doing a SQL challenge which asked to find the total number of patients grouped by gender and to generate the output as
male_patients | female_patients |
---|---|
200 | 250 |
(Note: The above numbers are not the exact ones; used Just for demonstrating the output)
The gender of the patients were in the column 'gender' which had two distinct values - 'M' for males and 'F' for females.
One of the solution provided was
SELECT
SUM(gender = 'M') AS male_patients,
SUM(gender = 'F') AS female_patients
FROM
patients;
which gave the expected output.
I would like to know the working of SUM() in the above code.
Thanks in advance.
(This can be solved in many ways but Iam interested to know how the expected output was generated by using SUM())
CodePudding user response:
The expression
SUM(gender = 'M')
is invalid in standard SQL and many DBMS. You simply cannot add up booleans, because what is the sum of true false true supposed to be? almost_true?
In some DBMS including MySQL, though, the query is valid, because they implicitly convert booleans to numbers, making true = 1 and false = 0. Thus your query would be interpreted as: add one for every row where the expression evaluates to true. So, in MySQL for instance, you are eventually counting all rows where the condition is met.
In standard SQL you'd add up the males by
SELECT COUNT(*) FILTER (WHERE gender = 'M') AS male_patients
CodePudding user response:
The input of the sum function in standard SQL and many DBMS should only be a numerical value and a conditional value cannot be placed inside it. The following command in standard SQL format will answer your needs:
SELECT
(SELECT SUM(1) FROM patients WHERE Gender='M') as male_patients,
(SELECT SUM(1) FROM patients WHERE Gender='F') as female_patients
or
SELECT M as [male_patients],F as [female_patients] FROM
(SELECT Gender,count(1) as CountGender FROM patients GROUP BY Gender) t
pivot(sum(CountGender) FOR Gender IN ([M],[F])) AS pivot_table
CodePudding user response:
I am making the assumption you gave the full code, and didn't miss copy pasting a block. Therefore, I feel like you are working with a specific flavor of SQL that allows for you to imply the only table imported applies to all the query? Otherwise, there's no way this code should be compiling without some sort of subquery within it, because SQL shouldn't know what you mean by "gender" in those SUM statements without knowing there's a patient table within the SUM, that has a field called "gender".
For illustrative purposes, I want to write out what it would be without the syntatic suger your SQL flavor is using (I am making some assumptions on what your fields are called, based on what would make sense for DB design):
SELECT SUM(SELECT age FROM patients WHERE gender = 'M') AS male_patients,
SUM(SELECT age FROM patients WHERE gender = 'F') AS female_patients
FROM patients;
The sum will take the sum of all values given. In both cases (again, I am assuming your code is just syntatic suger for what I am doing in the above snippet), I am giving it all of the ages, from the male patients in that first sum. It will then do literally what I asked it to do, which is spit out the sum of all of those ages.
As one other way to say this, Sum will return the total of all values given as the first argument.
I hope this is getting at what you were looking for here, but if there's a code block you didn't include, please put it in, and I will do an edit of my reply based on that.
Edit: fixed a few typos