I have a table like this
CREATE TABLE userinteractions
(
userid bigint,
dobyr int,
-- lots more fields that are not relevant to the question
);
My problem is that some of the data is polluted with multiple dobyr
values for the same user.
The table is used as the basis for further processing by creating a new table. These cases need to be removed from the pipeline.
I want to be able to create a clean table that contains unique userid
and dobyr
limited to the cases where there is only one value of dobyr
for the userid
in userinteractions
.
For example I start with data like this:
userid,dobyr
1,1995
1,1995
2,1999
3,1990 # dobyr values not equal
3,1999 # dobyr values not equal
4,1989
4,1989
And I want to select from this to get a table like this:
userid,dobyr
1,1995
2,1999
4,1989
Is there an elegant, efficient way to get this in a single sql query?
I am using postgres.
EDIT: I do not have permissions to modify the userinteractions
table, so I need a SELECT
solution, not a DELETE
solution.
CodePudding user response:
Clarified requirements: your aim is to generate a new, cleaned-up version of an existing table, and the clean-up means:
- If there are many rows with the same
userid
value but also the samedobyr
value, one of them is kept (doesn't matter which one), rest gets discarded. - All rows for a given
userid
are discarded if it occurs with differentdobyr
values.
create table userinteractions_clean as
select distinct on (userid,dobyr) *
from userinteractions
where userid in (
select userid
from userinteractions
group by userid
having count(distinct dobyr)=1 )
order by userid,dobyr;
This could also be done with an not in
, not exists
or exists
conditions. Also, select which combination to keep by adding columns at the end of order by
.
Demo with tests and more rows.
CodePudding user response:
Just use a HAVING clause to assert that all rows in a group must have the same dobyr.
SELECT
userid,
MAX(dobyr) AS dobyr
FROM
userinteractions
GROUP BY
userid
HAVING
COUNT(DISTINCT dobyr) = 1