Home > Mobile >  Remove duplicates in single column from view in SQL Server?
Remove duplicates in single column from view in SQL Server?

Time:06-30

I (beginner SQL user) have a SQL Server view, created with the following statement:

CREATE VIEW dbo.v_LSitesGeo AS

SELECT   PS.LSiteID, GEO.geographyColumn
FROM         dbo.Project_SiteID_Lookup AS PS INNER JOIN
             dbo.Geocodio AS GEO ON GEO.ProjectCode = PS.ProjectNumber
WHERE     (NOT (GEO.geographyColumn IS NULL)) AND (NOT (PS.LSiteID IS NULL))

However, the [LSiteID] column has duplicate values in it. I want to filter out any duplicates in the view however, they need to stay in the [Project_SiteID_Lookup] table. All of the [ProjectNumber] values are unique, so I can't use DISTINCT.

I tried to follow this post, however I wasn't able to get the following to work:

SELECT   PS.LSiteID, GEO.geographyColumn

FROM         dbo.Project_SiteID_Lookup AS PS INNER JOIN 
             dbo.Geocodio AS GEO ON GEO.ProjectCode = PS.ProjectNumber


WHERE       0 = (SELECT COUNT(PS.LSiteID) 
            FROM dbo.Project_SiteID_Lookup AS PS, dbo.Project_SiteID_Lookup AS PS2
            WHERE PS2.LSiteID = PS.LSiteID
            AND PS2.LSiteID < PS.LSiteID)

Any advice on how to make the above work, or another method to achieve this, would be much appreciated.

CodePudding user response:

You can use ROW_NUMBER() to identify the first row of each group. Then, filtering out the other rows is easy. For example:

select *
from (
  SELECT
    PS.LSiteID,
    GEO.geographyColumn,
    row_number() over(partition by PS.LSiteID order by PS.LSiteID) as rn
  FROM dbo.Project_SiteID_Lookup AS PS 
  INNER JOIN dbo.Geocodio AS GEO ON GEO.ProjectCode = PS.ProjectNumber
  WHERE (NOT (GEO.geographyColumn IS NULL)) 
    AND (NOT (PS.LSiteID IS NULL))
) x
where rn = 1

CodePudding user response:

You do not want to filter the duplicates; you want to aggregate them. This is an important difference, which quite directly points you to aggregation, that is, to group by:

SELECT   PS.LSiteID, STRING_AGG(GEO.geographyColumn, ',')
FROM         dbo.Project_SiteID_Lookup AS PS INNER JOIN
             dbo.Geocodio AS GEO ON GEO.ProjectCode = PS.ProjectNumber
WHERE     (NOT (GEO.geographyColumn IS NULL)) AND (NOT (PS.LSiteID IS NULL))
GROUP BY PS.LSiteID

No worries about the duplicates, the group by resolves it.

  • Related