In SQL how can I remove duplicates on a column without including it in the SELECT command/ the extra-CodePudding

I'm trying to remove duplicates on a column in SQL, without including that column in the extract (since it contains personally identifiable data). I thought I might be able to do this with nested queries (as below), however this isn't working. I also thought it might be possible to remove duplicates in the WHERE statement, but couldn't find anything from googling. Any ideas? Thanks in advance.

SELECT [ETHNIC], [RELIGION]  
FROM  
(SELECT DISTINCT [ID], [ETHNIC], [RELIGION]  
                                  FROM MainData)

CodePudding user response：

Using distinct like that will apply distinct to the row, so if there are two rows with the same ID but different ETHNIC and RELIGION the distinct won't remove them. To do that you could use group by in your query, but then you need to use an aggregation (e.g. max):

SELECT [ETHNIC], [RELIGION]
FROM
(SELECT [ID], MAX([ETHNIC]) AS ETHNIC, MAX([RELIGION]) AS RELIGION
FROM MainData
GROUP BY [ID])

If that's not what you're looking for, some SQL dialects require that you name your inner select, so you could try adding AS X to the end of your query.