Collapse multiple rows with character variables -- SQL-CodePudding

I have a table that looks like the following:

id           gender          race          income
1            M             [REDACTED]      10,000
1            [REDACTED]    2054-5          [REDACTED]
2            F             [REDACTED]      50,000
2            [REDACTED]    2054-5          [REDACTED]

I am trying to collapse it by the id variable, such that I get this dataset

id          gender    race    income
1           M         2054-5  10,000
2           F         2054-5  50,000

Normally, I would do the following:

select 
max(gender),
max(race),
max(income),
id
from
table
group by id

but gender, race, and income are not numeric so I can't do that. Is there a way to select the non-redacted answers? For reference, I am conducting this in Snowflake using SnowSql.

CodePudding user response：

You can try replacing the '[REDACTED]' value with null before the aggregation:

select max(CASE WHEN gender='[REDACTED]' THEN NULL ELSE gender END),
       max(CASE WHEN race  ='[REDACTED]' THEN NULL ELSE race   END),
       max(CASE WHEN income='[REDACTED]' THEN NULL ELSE income END),
       id
from table
group by id

For a tested solution, please update your post with the DBMS you're using.

CodePudding user response：

You can use CASE or NULLIF @lemon's answer shows you how to use CASE here is how to use NULLIF

select max(NULLIF(gender,'[REDACTED]')) as gender,
       max(NULLIF(race  ,'[REDACTED]')) as race,
       max(NULLIF(income,'[REDACTED]')) as income,
       id
from table
group by id