Home > Net >  Removing duplicate entries whilst joining
Removing duplicate entries whilst joining

Time:07-15

user learning_group program learning_group_user learning_group_program
id id id id id
name name name learning_group_id learning_group_id
user_id program_id

I am working with Postgres. My app will track users and what programs they are assigned to. User are associated with programs by adding them to learning groups. If a user is assigned to a learning group and a program is assigned to that learning group, then he is eligible to complete that program.

Users can be assigned to many learning groups.

Programs can also be assigned to many learning groups

I want to get the total number of users that are associated to a particular program, but, I don't want to count users more than once if they are assigned to multiple learning groups and the program is also associated with those same learning groups.

For example: if user "John Smith" is assigned to 3 learning groups, and, "Science Program" is assigned to all 3 of those learning groups, when I get the total number of users that are eligible to complete the "Science Program", John should only be counted once, even though naturally he would be counted 3 times because he is associated with the program 3 times via the 3 different learning groups.

My plan is to achieve this as a sub query, so when I query a program by id, I can include the number of users that are enrolled in that program:

SELECT 
  p.id, 
  p.data, 
  (
    SELECT count(u.id)
    FROM learning_group_user lgu
    INNER JOIN learning_group_program lgp ON lgp.learning_group_id = lgu.learning_group_id
    INNER JOIN program p ON p.id = lgp.program_id
    INNER JOIN user u ON u.id = lgu.user_id
    WHERE lgp.program_id = 'e4b284da-c94b-4546-84b9-2616e6a32b53'
  ) program_learners_count 
FROM program p
WHERE p.id = 'e4b284da-c94b-4546-84b9-2616e6a32b53';

The above query will count a user multiple times if he is assigned to multiple groups and the program is also assigned to those groups. He should only be counted once.

Update Based on Simonas Petkevičius answer I was able to complete my full query which now looks like this:

SELECT 
    p.*,
    (
        SELECT count(distinct lgl.user_id)
        FROM learning_group_learner lgl
            INNER JOIN learning_group_program lgp 
            ON lgp.learning_group_id = lgl.learning_group_id
        WHERE lgp.program_id = '3772c437-f881-4a69-b846-544d1783b1a4'
    )
FROM program p
WHERE p.id = '3772c437-f881-4a69-b846-544d1783b1a4';

CodePudding user response:

This should do the job:

 SELECT count(distinct lgu.user_id)
   FROM learning_group_user lgu
     INNER JOIN learning_group_program lgp 
       ON lgp.group_id = lgu.group_id

 WHERE lgp.program_id = :programId;

it will only count the distinct userIds, however, note possible performance issues, you might need to optimize it along the way postgresql COUNT(DISTINCT ...) very slow.

Also, note that you don't need two extra joins:

INNER JOIN program p ON p.id = lgp.program_id
INNER JOIN user u ON u.id = lgu.user_id

in your subquery, as you already have user_id and program_id in your many-to-many join tables learning_group_user and learning_group_program.

Sample data and query for your subquery to fiddle around: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=b9a62e9f6d75eb459376a74a589ca46e

Update

Here is a query you could do without a subquery utilizing group by property.

SELECT 
   p.id,
   p.name,
   count(distinct lgu.user_id)
 FROM program as p
   JOIN learning_group_program as lgp ON lgp.program_id = p.id
   JOIN learning_group_user as lgu
        ON lgp.group_id = lgu.group_id

    WHERE lgp.program_id = 99
    group by p.id, p.name;

test: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=2fc33ef133a24165573bb15f6404b17a

CodePudding user response:

You can change your perspective a bit, like.. using where exists, something like:

select *
from "user" as u
where exists
      (
        select *
        from "learning_group_program" as lp
          inner join "learning_group_user" as lu
          on lp.learning_group_id = lu.learning_group_id
        where lp.program_id = 1
        and lu.user_id = u.id
      );

Fiddle: https://www.db-fiddle.com/f/t7yFyQEc9KYtLKepuQMb6G/0

I used integer as ID, writing full GUID is just too cumbersome.

  • Related