Finding x rows with distinct value in col1 but same value in col2-CodePudding

Using postgres 11 I have a data table int_state that looks like this:

dev	int
sw01	gi1
sw01	gi2
sw01	gi3
sw01	gi4
sw02	gi3
sw02	gi4
sw02	gi5
sw03	gi3
sw03	gi4
sw03	gi6
sw03	gi7

I need a query that will return a single row for every instance of dev where int is the same across every instance of dev. Using the sample above query should only return "gi3" as the value for int since that value is shared between all instances of dev and result is limited is a single row for every dev. Desired results from sample data above:

dev	int
sw01	gi3
sw02	gi3
sw03	gi3

The number of dev instances (x) to match against will vary, but results should always contain x rows, with a row for every dev asked for in the query. If there are multiple matches against int the query should only return the first one found.

Using a JOIN doesn't seem like it will work for me, since I need x rows of results, not a single row with all results combined. I have been playing around with CTEs and UNION but so far unable to get it working. I thought nesting CTEs would help, but apparently that's not supported in postgres.

I plan on constructing this query on-demand in code ahead of time, so if I have to construct something that contains where dev in ('sw01','sw02','x','y','z') that's okay.

CodePudding user response：

WITH distinctdevs AS (SELECT DISTINCT dev FROM int_state) 
SELECT distinctdevs.dev, (SELECT int FROM int_state ints WHERE NOT EXISTS 
 (SELECT dev FROM distinctdevs WHERE NOT EXISTS (
   SELECT * FROM int_state WHERE 
     int_state.int = ints.int AND int_state.dev = distinctdevs.dev
   )
  ) LIMIT 1) FROM distinctdevs;

You might actually get better performance if you don't reuse the CTE on line 3, as that might trick the optimizer into thinking this is a correlated subquery rather than a constant. The core trick here is the doubly nested WHERE NOT EXISTS, converting "an int that appears for every dev" to the logically equivalent "an int for which there is no dev for which it does not appear."

CodePudding user response：

ok, so if I understand your question, you want to find all the instances of dev (01, 02, 03, etc), and then check that the value for int is the same for all the instances?

You can do a clever workaround I think....

---- get the count of how many instances of dev there are...
    WITH distinct_dev AS (  
      SELECT  
        dev  
        , count(*) AS dev_count  
      FROM
        table
      GROUP BY
        1
    )
----- get the count of how many instances of int there are per dev/int
    , dev_int AS (
      SELECT 
         t.dev
         , t.int
         , COUNT(DISTINCT t2.int) as count_dev_int
      FROM
        original_table t
        LEFT JOIN
        original_table t2
        ON t.dev = t2.dev
        AND t.int = t2.int
     GROUP BY
       1, 2
    )
----- then from the first CTE select unique devs, and join the 2nd CTE where the dev envs = the count of int per dev env....
SELECT
   dd.dev
   , di.int
FROM
   distinct_dev dd
   JOIN
   dev_int di
   ON di.dev = dd.dev
   AND di.count_dev_int = dd.count_dev

CodePudding user response：

You can build CTE with an array of distinct dev values and another with array of dev values for each int using array_agg. Then JOIN then on matching arrays. (see demo)

with dev_array as 
     ( select array_agg(distinct dev) dev 
         from data_tab
     ) 
   , int_match as 
     (select int, array_agg(dev) dev
        from data_tab
       group by int
     )
select int, unnest(dev) dev
  from (select im.*
          from dev_array da
          join int_match im  
            on (im.dev = da.dev)
         order by im.int
         limit 1
       ) sq;