I have a table that looks like the following:
id gender race income
1 M [REDACTED] 10,000
1 [REDACTED] 2054-5 [REDACTED]
2 F [REDACTED] 50,000
2 [REDACTED] 2054-5 [REDACTED]
I am trying to collapse it by the id variable, such that I get this dataset
id gender race income
1 M 2054-5 10,000
2 F 2054-5 50,000
Normally, I would do the following:
select
max(gender),
max(race),
max(income),
id
from
table
group by id
but gender, race, and income are not numeric so I can't do that. Is there a way to select the non-redacted answers? For reference, I am conducting this in Snowflake using SnowSql.
CodePudding user response:
You can try replacing the '[REDACTED]'
value with null before the aggregation:
select max(CASE WHEN gender='[REDACTED]' THEN NULL ELSE gender END),
max(CASE WHEN race ='[REDACTED]' THEN NULL ELSE race END),
max(CASE WHEN income='[REDACTED]' THEN NULL ELSE income END),
id
from table
group by id
For a tested solution, please update your post with the DBMS you're using.
CodePudding user response:
You can use CASE or NULLIF @lemon's answer shows you how to use CASE here is how to use NULLIF
select max(NULLIF(gender,'[REDACTED]')) as gender,
max(NULLIF(race ,'[REDACTED]')) as race,
max(NULLIF(income,'[REDACTED]')) as income,
id
from table
group by id