I have a postgresql table contains a list of email addresses. The table has three columns, Email, EmailServer (e.g., gmail.com, outlook.com, msn.com, and yahoo.com.ca etc.), and Valid (boolean).
Now, I want to group those emails by EmailServer and then update the first 3 records of each large group (count >=6) as Valid = true while leaving the rest of each group as Valid = false.
I failed to get the wanted output by below query:
UPDATE public."EmailContacts"
SET "Valid"=true
WHERE "EmailServer" IN (
SELECT "EmailServer"
FROM public."EmailContacts"
GROUP by "EmailServer"
HAVING count(*) >=6
LIMIT 5)
Please help to modify so as to get the expected results. Would be greatly appreciated for any kind of your help!
CodePudding user response:
WITH major_servers AS (
SELECT email_server
FROM email_address
GROUP by email_server
HAVING count(*) >=6
),
enumerated_emails AS (
SELECT email,
email_server,
row_number() OVER (PARTITION BY email_server ORDER BY email) AS row_number --TODO:: ORDER BY email - attention
FROM email_address
WHERE email_server IN (SELECT email_server FROM major_servers)
)
UPDATE email_address
SET valid = true
WHERE email IN (SELECT email
FROM enumerated_emails ee
WHERE ee.row_number <= 3);
- The first query
major_servers
finds major groups where more than 5 email servers exist. - The second query
enumerated_emails
enumerates emails by their natural order (see a TODO comment, I think you should choose another ORDER BY criteria) which belong to major groups using window functionrow_number()
. - The last query updates the first 3 rows in each major server group.
Find the sql-fiddle here.
CodePudding user response:
You need to get the servers, then order the mails from which one and then perform the update. Something like this:
WITH DataSourceServers AS
(
SELECT "EmailServer"
FROM public."EmailContacts"
GROUP by "EmailServer"
HAVING count(*) >=6
),DataSourceEmails AS
(
SELECT "Email", row_number() OVER (PARTITION BY "EmailServer" ORDER BY "Email") AS rn
FROM public."EmailContacts"
WHERE "EmailServer" IN (SELECT "EmailServer" FROM DataSourceServers)
)
UPDATE public."EmailContacts"
SET "Valid" = true
FROM public."EmailContacts" E
INNER JOIN DataSourceEmails SE
WHERE E."EmailServer" = SE."EmailServer"
AND E."Email" = SE."Email"
AND SE.rn <= 3;