I've tried writing my sql query to select multiple records on to one row but it isn't working the way I expected it to Currently my table looks something like this
person id | fruit |
---|---|
1 | apple |
1 | orange |
1 | banana |
2 | apple |
2 | orange |
3 | apple |
I've tried using CASE and GROUP BY but it just gave extra records and didn't display the way I wanted it to and is displaying like this
SELECT DISTINCT
F.MEMBER
,F.GIVEN_NAMES
,F.SURNAME
--VALUES NEEDED
,CASE WHEN F.VALUE_NEEDED = 'Postal Address' THEN 'Yes' ELSE '' END POSTAL_ADDRESS
,CASE WHEN F.VALUE_NEEDED = 'Birthday' THEN 'Yes' ELSE '' END BIRTHDAY
,CASE WHEN F.VALUE_NEEDED = 'Email Address' THEN 'Yes' ELSE '' END EMAIL_ADDRESS
,CASE WHEN F.VALUE_NEEDED = 'First Name' THEN 'Yes' ELSE '' END FIRST_NAME
,CASE WHEN F.VALUE_NEEDED = 'Surname' THEN 'Yes' ELSE '' END SURNAME
,CASE WHEN F.VALUE_NEEDED = 'Title and Gender' THEN 'Yes' ELSE '' END 'TITLE|GENDER'
,CASE WHEN F.VALUE_NEEDED = 'Mobile' THEN 'Yes' ELSE '' END MOBILE
,CASE WHEN F.VALUE_NEEDED = 'Beneficiary' THEN 'Yes' ELSE '' END BENEFICIARY
FROM #FINAL F
GROUP BY F.MEMBER,F.GIVEN_NAMES
,F.SURNAME,VALUE_NEEDED
ORDER BY F.MEMBER
person id | apple | orange | banana |
---|---|---|---|
1 | yes | ||
1 | yes | ||
1 | yes |
How do I write the query so it looks more like this?
person id | apple | orange | banana |
---|---|---|---|
1 | yes | yes | yes |
2 | yes | yes | |
3 | yes |
CodePudding user response:
You are almost there, just needed to add the max and group by to aggregate it . This used to be a typical interview question back then. Some thing like this if I understood correctly
with t as
(
select 1 as person_id, 'apple' fruit
union
select 1 ,'orange'
union
select 1 ,'banana'
union
select 2 ,'apple'
union
select 2 ,'orange'
union
select 3 ,'apple'
)
, b as
(
select
person_id,
case when fruit= 'apple' then 'yes' else null end 'apple',
case when fruit= 'orange' then 'yes' else null end 'orange',
case when fruit= 'banana' then 'yes' else null end 'banana'
from t
)
select
person_id,
max(apple) apple,
max(orange) orange,
max(banana) banana
from b
group by 1;
person_id | apple | orange | banana |
---|---|---|---|
1 | yes | yes | yes |
2 | yes | yes | NULL |
3 | yes | NULL | NULL |
CodePudding user response:
You have tagged the tool that you are using (SQL Server Management Studio) which can be used with different DBMS. As Microsoft's SQL Server is the most typical used in this tool, I assume your are using that.
First let's look at your table. It seems a bit weird. It seems to be a kind of key-value table (aka EAV). Each row tells us for an attribute whether it is needed for a person. Now how to identify a person in the table? Is the column member
a unique person ID? Probably not, because then, what would be given_names
and surname
be for in that table, that can change with every entry. Why would the same person with the ID 1234 be called John Smith when the value_needed
is 'Birthday', but be called 'Anne Miller' when value_needed
is 'Mobile'? That wouldn't make much sense. So maybe member
is just a flag, whether a person is a member or not, and a person is uniquely identified by their given_names
and surname
. But then again, why would the same person John Smith be a member when value_needed
is 'Birthday', but not a member when value_needed
is 'Mobile'? So something is amiss here. It seems your table is not normalized. Better have one person table and one table for the attributes.
That being said, GROUP BY ___
means "I want one result row per ___". You group by the person and their value_needed
. But you don't want one result row per person and value_needed
. You want one resut row per person. Hence, group by person.
Then you SELECT DISTINCT ...
. This means you want to remove duplicate rows. But look at the rows you are selecting. There are no duplicates. If you use GROUP BY
, you can be 99.99% sure you don't need DISTINCT
. (There do exist rare situations where you voluntarily group by columns, don't select all of them and then apply DISTINCT
, but these are so rare that you probably won't ever use them at all.)
Now to the task: You want to get from rows to columns. This is called pivot and can be achieved with the PIVOT
keyword, but it is more common to use conditional aggregation. "Conditional aggregation" means that you aggregate your data (per person) and then apply a condition. In standard SQL:
SELECT MIN('YES') FILTER (WHERE f.value_needed = 'Postal Address')
You can use MIN
or MAX
here, and it is only required for syntax reasons (the FILTER
clause must refer to some aggregation function.
In SQL Server there is no FILTER
clause, so you use a CASE
expression instead:
SELECT MIN(CASE WHEN f.value_needed = 'Postal Address' THEN 'YES' END)
If you want the empty string ''
instead of NULL, apply COALESCE
:
SELECT COALESCE(MIN(CASE WHEN f.value_needed = 'Postal Address' THEN 'YES' END), '')
Columns aliases containing special characters like |
require quoting. But not single quotes, as these denote string literals. In standard SQL use double quotes, in SQL Server use brackets. But better, just avoid them alltogether, by avoiding special characters in names.
The complete query:
SELECT
person_id,
MIN(CASE WHEN value_needed = 'Postal Address' THEN 'yes' end) AS postal_address,
MIN(CASE WHEN value_needed = 'Birthday' THEN 'Yes' end) AS birthday,
MIN(CASE WHEN value_needed = 'Email' THEN 'Yes' END) AS email_address,
MIN(CASE WHEN value_needed = 'First Name' THEN 'Yes' END) AS first_name,
MIN(CASE WHEN value_needed = 'Surname' THEN 'Yes' END) AS surname,
MIN(CASE WHEN value_needed = 'Title and Gender' THEN 'Yes' END) AS title_gender,
MIN(CASE WHEN value_needed = 'Mobile' THEN 'Yes' END) AS mobile,
MIN(CASE WHEN value_needed = 'Beneficiary' THEN 'Yes' END) AS beneficiary
FROM #FINAL
GROUP BY person_id
ORDER BY person_id;