Consider that I have the following tables/entities:
Posts
id | user_id | title | content |
---|---|---|---|
1 | 1 | Article 1 | Lorem ipsum |
2 | 1 | Article 1 | Lorem ipsum |
3 | 1 | Article 2 | Lorem ipsum 2 |
4 | 2 | Article 3 | Lorem ipsum |
Users
id | name |
---|---|
1 | John Doe |
2 | Timothy Fisher |
Notice that there are two posts from the user with an ID of 1 that have the same title and content. There was an error at the application level that allowed a user to submit a post twice in the past, leading to "duplicate" records.
I'm looking to query all of the posts, but consolidate the posts for each user that have duplicate titles and content.
The ideal result set would look like this:
post_id | author_name | title | content |
---|---|---|---|
1 | John Doe | Article 1 | Lorem ipsum |
3 | John Doe | Article 2 | Lorem ipsum 2 |
4 | Timothy Fisher | Article 3 | Lorem ipsum |
SELECT
posts.id as post_id,
users.name as author_name
posts.title,
posts.content
FROM
posts
INNER JOIN
users
ON
posts.user_id = users.id;
Whether or not the query pulled post 1
or 2
for John Doe wouldn't matter. In the actual databases, I have timestamps so I'd likely just pull the latest one.
Is this possible with SQL?
CodePudding user response:
you could use a fake aggregation function and group by
SELECT
min(posts.id) as post_id,
users.name as author_name
posts.title,
posts.content
FROM posts
INNER JOIN users ON posts.user_id = users.id
GROUP BY users.name, posts.title, posts.content
CodePudding user response:
In MySQL you can also do it without aggregation. Yet I'm not sure if it's a better way (more info here).
SELECT
posts.id as post_id,
users.name as author_name,
posts.title,
posts.content
FROM
posts
INNER JOIN
users
ON
posts.user_id = users.id
GROUP BY
users.name, posts.title, posts.content;