Home > Mobile >  SOLVED - Get rid of duplicate rows in SQL query
SOLVED - Get rid of duplicate rows in SQL query

Time:07-21

First of all, this is a database I was given, and it is a pain to work with.

I have this table:

content_id version content_version_id alias publish_date contentattribute_name contentattribute_value
4507 4 10369 /aktuelt/arkiv/ 04/04/2014 category_names CAT1, CAT2
4348 3 10373 /publikasjoner/ 23/05/2011 category_names CAT1, CAT2
4348 4 10374 /publikasjoner/ 23/05/2011 category_names CAT1, CAT2
4439 3 10701 /publikasjoner/ 08/04/2015 category_names CAT1, CAT2
4439 4 10702 /publikasjoner/ 08/04/2015 category_names CAT1, CAT2

I am trying to get rid of duplicates from the column "content_id", and I've searched for answers on stackoverflow, but I couldn't get it to work with my SQL query.

Can anyone help me with the current query?

SELECT db_name.contentversion.ContentId as 'content_id',
db_name.contentversion.Version as 'version',
db_name.contentversion.ContentVersionId as 'content_version_id', 
db_name.content.alias as 'alias',
db_name.content.PublishDate as 'publish_date', 
db_name.contentattributes.Name as 'contentattribute_name', 
db_name.contentattributes.Value as 'contentattribute_value'
FROM db_name.contentversion
JOIN db_name.content
ON db_name.content.ContentId = db_name.contentversion.ContentId
JOIN db_name.contentattributes
ON db_name.contentattributes.ContentVersionId = db_name.contentversion.ContentVersionId

EDIT

I think I managed to solve it by using this query:

SELECT DISTINCT a.ContentId, a.Version, a.ContentVersionId,
db_name.content.alias as 'alias',
db_name.content.PublishDate as 'publish_date', 
db_name.contentattributes.Name as 'contentattribute_name', 
db_name.contentattributes.Value as 'contentattribute_value'
FROM contentversion a
INNER JOIN (
SELECT ContentId, MAX(Version) Version
FROM contentversion
GROUP BY ContentId
) b ON a.ContentId = b.ContentId AND a.Version = b.Version
JOIN db_name.content
ON db_name.content.ContentId = a.ContentId
JOIN db_name.contentattributes
ON db_name.contentattributes.ContentVersionId = a.ContentVersionId

CodePudding user response:

It seems like content_id is not a primary key and it can hold in two different versions ie 3 or 4. You may want to get the version you are interested in so that you filter it using WHERE version= ? or just take distinct content_id if the version type doesn't matter

CodePudding user response:

You have to omit version and content_version_id columns from the query as it has different values for the same content_id. Then, remove duplicate using group by clause as following.

SELECT db_name.contentversion.ContentId as 'content_id',
 db_name.content.alias as 'alias',
 db_name.content.PublishDate as 'publish_date', 
 db_name.contentattributes.Name as 'contentattribute_name', 
 db_name.contentattributes.Value as 'contentattribute_value'
FROM db_name.contentversion
JOIN db_name.content
ON db_name.content.ContentId = db_name.contentversion.ContentId
GROUP BY
 db_name.contentversion.ContentId,
 db_name.content.alias,
 db_name.content.PublishDate, 
 db_name.contentattributes.Name, 
 db_name.contentattributes.Value;

CodePudding user response:

You want the maximum version per content ID. Probably the best approach for this is a lateral join, available since MySQL 8.0.14.

SELECT 
  cv.contentid, 
  cv.version, 
  cv.contentversionid,
  c.alias,
  c.publishdate AS publish_date, 
  ca.name AS contentattribute_name,
  ca.value AS contentattribute_value
FROM db_name.content c
CROSS JOIN LATERAL
(
  SELECT *
  FROM contentversion
  WHERE contentversion.contentid = c.contentid
  ORDER BY contentversion.version DESC
  LIMIT 1
) cv
INNER JOIN db_name.contentattributes ca ON ca.contentversionid = cv.contentversionid
ORDER BY cv.contentid, ca.name;
  • Related