Home > Mobile >  How to group by one column and limit to rows where another column has the same value for all rows in
How to group by one column and limit to rows where another column has the same value for all rows in

Time:12-01

I have a table like this

CREATE TABLE userinteractions
(
userid  bigint,
dobyr int,
-- lots more fields that are not relevant to the question
);

My problem is that some of the data is polluted with multiple dobyr values for the same user.

The table is used as the basis for further processing by creating a new table. These cases need to be removed from the pipeline.

I want to be able to create a clean table that contains unique userid and dobyr limited to the cases where there is only one value of dobyr for the userid in userinteractions.

For example I start with data like this:

userid,dobyr
1,1995
1,1995
2,1999
3,1990 # dobyr values not equal
3,1999 # dobyr values not equal
4,1989
4,1989

And I want to select from this to get a table like this:

userid,dobyr
1,1995
2,1999
4,1989

Is there an elegant, efficient way to get this in a single sql query?

I am using postgres.

EDIT: I do not have permissions to modify the userinteractions table, so I need a SELECT solution, not a DELETE solution.

CodePudding user response:

Clarified requirements: your aim is to generate a new, cleaned-up version of an existing table, and the clean-up means:

  1. If there are many rows with the same userid value but also the same dobyr value, one of them is kept (doesn't matter which one), rest gets discarded.
  2. All rows for a given userid are discarded if it occurs with different dobyr values.
create table userinteractions_clean as 
select distinct on (userid,dobyr) * 
from userinteractions
where userid in ( 
    select userid
    from userinteractions
    group by userid
    having count(distinct dobyr)=1 )
order by userid,dobyr;

This could also be done with an not in, not exists or exists conditions. Also, select which combination to keep by adding columns at the end of order by. Demo with tests and more rows.

CodePudding user response:

Just use a HAVING clause to assert that all rows in a group must have the same dobyr.

SELECT
  userid,
  MAX(dobyr) AS dobyr
FROM
  userinteractions
GROUP BY
  userid
HAVING
  COUNT(DISTINCT dobyr) = 1
  • Related