Home > OS >  Postgresql query: update status of limit number of records based on group size
Postgresql query: update status of limit number of records based on group size

Time:01-11

I have a postgresql table contains a list of email addresses. The table has three columns, Email, EmailServer (e.g., gmail.com, outlook.com, msn.com, and yahoo.com.ca etc.), and Valid (boolean).

Now, I want to group those emails by EmailServer and then update the first 3 records of each large group (count >=6) as Valid = true while leaving the rest of each group as Valid = false.

I failed to get the wanted output by below query:

UPDATE public."EmailContacts"   
SET "Valid"=true
WHERE "EmailServer" IN (
    SELECT "EmailServer"
    FROM public."EmailContacts"
    GROUP by "EmailServer"
    HAVING count(*) >=6
    LIMIT 5)

Please help to modify so as to get the expected results. Would be greatly appreciated for any kind of your help!

CodePudding user response:

WITH major_servers AS (
    SELECT email_server
    FROM email_address
    GROUP by email_server
    HAVING count(*) >=6
),
enumerated_emails AS (
    SELECT email,
           email_server,
           row_number() OVER (PARTITION BY email_server ORDER BY email) AS row_number --TODO:: ORDER BY email - attention
    FROM email_address
    WHERE email_server IN (SELECT email_server FROM major_servers)
)
UPDATE email_address
SET valid = true
WHERE email IN (SELECT email
                FROM enumerated_emails ee
                WHERE ee.row_number <= 3);
  • The first query major_servers finds major groups where more than 5 email servers exist.
  • The second query enumerated_emails enumerates emails by their natural order (see a TODO comment, I think you should choose another ORDER BY criteria) which belong to major groups using window function row_number().
  • The last query updates the first 3 rows in each major server group.

Find the sql-fiddle here.

CodePudding user response:

You need to get the servers, then order the mails from which one and then perform the update. Something like this:

WITH DataSourceServers AS 
(
    SELECT "EmailServer"
    FROM public."EmailContacts"
    GROUP by "EmailServer"
    HAVING count(*) >=6
),DataSourceEmails AS
(
    SELECT "Email", row_number() OVER (PARTITION BY "EmailServer" ORDER BY "Email") AS rn
    FROM public."EmailContacts"
    WHERE "EmailServer" IN (SELECT "EmailServer" FROM DataSourceServers)
)
UPDATE public."EmailContacts"
SET "Valid" = true
FROM  public."EmailContacts" E
INNER JOIN DataSourceEmails SE
WHERE E."EmailServer" = SE."EmailServer"
    AND E."Email" = SE."Email"
    AND SE.rn <= 3;
  • Related