I am struggling a little bit with improving this query:
SELECT
pm.miembro_documento, COUNT(*)
FROM
PostulantesMiembros pm
GROUP BY
miembro_documento
HAVING
COUNT(*) > 1;
This query is showing me the people in PostulantesMiembros whose identification (miembro_documento) is 2 times or more (repeated registers) like so:
miembro_documento | count |
---|---|
71850071 | 2 |
now for your understanding of what I wish to do, if I search by the identification (miembro_documento) in PostulantesMiembros, by for example the 71850071 you can see on the table, I get this result:
miembro_id | miembro_nombres | miembro_documento | miembro_postulante |
---|---|---|---|
27224 | Roberto | 71850071 | 8283 |
29658 | Roberto | 71850071 | 9042 |
As you can see the same person is here 2 times, which means in the context of the application that he has 2 certificates associated to him, having this in mind now there is a column in PostulantesMiembros that is called "miembro_postulante
", this column is the foreign key to the table "Postulantes
" which primary key is "postulante_id
", this table is the certificates, here in Postulantes there is a field called "postulante_postulacion_fecha
" which is the date of the asignation/opening of the certificate, now what I wish to do is to get the registers that are duplicated by member document in PostulantMembers
(this is what the query at the begining of question is doing) but also have the same date (postulante_postulacion_fecha
) in Postulantes
(this second condition is what im struggling to achieve), because I want to see if apart from being duplicated because he opened or got asigned 2 certificates, it got done on the same date, to check if this is possible to happen or has happened before.
What I have tried is:
SELECT
pm.miembro_documento, p.postulante_postulacion_fecha,
COUNT(*)
-- tables we need
FROM
PostulantesMiembros pm, Postulantes p
-- the relationship between the tables, I'm not sure if this AND is doing something (Postulantes does not have member_document field)
WHERE
pm.miembro_postulante = p.postulante_id
AND pm.miembro_documento = pm.miembro_documento
-- I though that by doing p.postulante_postulacion_fecha = p.postulante_postulacion_fecha I would get registers that also had same date but not working
AND p.postulante_postulacion_fecha = p.postulante_postulacion_fecha
GROUP BY
pm.miembro_documento, p.postulante_postulacion_fecha
HAVING
COUNT(*) > 1;
An expected result would be:
miembro_documento | postulante_postulacion_fecha | count |
---|---|---|
71850071 | 2007-04-27 00:00:00.000 | 2 |
Here I could see the member document that is reapeated and the date of both certifications (this date is the same in both certifications)
So for this expected result to happen, I think that the data in Postulantes
table should be like this for example:
postulante_id | postulante_postulacion_fecha |
---|---|
8283 | 2007-04-27 00:00:00.000 |
9042 | 2007-04-27 00:00:00.000 |
And on PostulantesMiembros
table, it should look like this:
miembro_id | miembro_nombres | miembro_documento | miembro_postulante |
---|---|---|---|
27224 | Roberto | 71850071 | 8283 |
29658 | Roberto | 71850071 | 9042 |
I would appreciate alot any help, please tell me if you need any more context or have any questions
EDIT (I will try to explain what is wrong with the current query, it is bringing valid results (like the desired one above this EDIT), but also not valid results like the ones im going to explain):
The query is bringing results like this:
miembro_documento | postulante_postulacion_fecha | count |
---|---|---|
71290588 | 2002-07-30 00:00:00.000 | 2 |
But when I check how this data looks in PostulantesMiembros (Serch by miembro_documento 71290588):
miembro_id | miembro_nombres | miembro_documento | miembro_postulante |
---|---|---|---|
27224 | David | 71850071 | 4355 |
29658 | Lilibeth | 71850071 | 4355 |
And in Postulantes it looks like this:
postulante_id | postulante_postulacion_fecha |
---|---|
4355 | 2002-07-30 00:00:00.000 |
As you can see I have David and Lilibeth, yes both with the same identification number (miembro_documento), and thats not the problem with the query (you could also imagine that both registers have the name David if you want), but the problem is that the I need it to only show results if the miembro_postulante
column is different, like lets suppose David has 4355, then Lilibeth should have 4356, for me to then check at Postulantes and find out that postulante_postulacion_fecha
is the same on both 4356 and 4355. If they both have the same miembro_postulante
then this is not going to be useful... I hope I explained myself well, feel free to ask for more input.
What is expected?? For it to be a valid result, then when searching by miembro_documento
on PostulantesMiembros
71290588, I should see something like this:
miembro_id | miembro_nombres | miembro_documento | miembro_postulante |
---|---|---|---|
27224 | David | 71850071 | 4355 |
29658 | Lilibeth | 71850071 | 4356 |
Notice miembro_postulante
is different, and then when searching at Postulantes table with id 4355, and then with id 4356, I should find out that they both have the same postulante_postulacion_fecha
(Date):
postulante_id | postulante_postulacion_fecha |
---|---|
4355 | 2002-07-30 00:00:00.000 |
postulante_id | postulante_postulacion_fecha |
---|---|
4356 | 2002-07-30 00:00:00.000 |
CodePudding user response:
Schema (MySQL v8.0)
CREATE TABLE PostulantesMiembros (
`miembro_id` INTEGER,
`miembro_nombres` VARCHAR(7),
`miembro_documento` INTEGER,
`miembro_postulante` INTEGER
);
INSERT INTO PostulantesMiembros
(`miembro_id`, `miembro_nombres`, `miembro_documento`, `miembro_postulante`)
VALUES
('27224', 'Roberto', '71850071', '8283'),
('29658', 'Roberto', '71850071', '9042');
CREATE TABLE Postulantes (
`postulante_id` INTEGER,
`postulante_postulacion_fecha` DATETIME
);
INSERT INTO Postulantes
(`postulante_id`, `postulante_postulacion_fecha`)
VALUES
('8283', '2007-04-27 00:00:00.000'),
('9042', '2007-04-27 00:00:00.000');
Query
SELECT miembro_documento
, postulante_postulacion_fecha
, COUNT(*)
FROM PostulantesMiembros pm
JOIN Postulantes p
ON pm.miembro_postulante = p.postulante_id
GROUP BY miembro_documento, postulante_postulacion_fecha
HAVING COUNT(*) > 1;
miembro_documento | postulante_postulacion_fecha | COUNT(*) |
---|---|---|
71850071 | 2007-04-27 00:00:00 | 2 |
CodePudding user response:
It appears you want a condition ensuring that the number of distinct miembro_postulante
values per group is more than one.
After cleaning up your code by using modern join syntax, and removing redundant predicates, we can use COUNT(DISTINCT
SELECT
pm.miembro_documento,
p.postulante_postulacion_fecha,
COUNT(*)
FROM
PostulantesMiembros pm
JOIN Postulantes p ON pm.miembro_postulante = p.postulante_id
WHERE pm.miembro_documento IS NOT NULL
GROUP BY
pm.miembro_documento, p.postulante_postulacion_fecha
HAVING
COUNT(DISTINCT pm.miembro_postulante) > 1;
A possibly more efficient version compares the min and max.
SELECT
pm.miembro_documento,
p.postulante_postulacion_fecha,
COUNT(*)
FROM
PostulantesMiembros pm
JOIN Postulantes p ON pm.miembro_postulante = p.postulante_id
WHERE pm.miembro_documento IS NOT NULL
GROUP BY
pm.miembro_documento, p.postulante_postulacion_fecha
HAVING
MIN(pm.miembro_postulante) <> MAX(pm.miembro_postulante);