Home > Net >  How can I get duplicate rows that have different values in another field
How can I get duplicate rows that have different values in another field

Time:08-01

I am struggling a little bit with improving this query:

SELECT
    pm.miembro_documento, COUNT(*) 
FROM 
    PostulantesMiembros pm 
GROUP BY 
    miembro_documento 
HAVING
    COUNT(*) > 1;

This query is showing me the people in PostulantesMiembros whose identification (miembro_documento) is 2 times or more (repeated registers) like so:

miembro_documento count
71850071 2

now for your understanding of what I wish to do, if I search by the identification (miembro_documento) in PostulantesMiembros, by for example the 71850071 you can see on the table, I get this result:

miembro_id miembro_nombres miembro_documento miembro_postulante
27224 Roberto 71850071 8283
29658 Roberto 71850071 9042

As you can see the same person is here 2 times, which means in the context of the application that he has 2 certificates associated to him, having this in mind now there is a column in PostulantesMiembros that is called "miembro_postulante", this column is the foreign key to the table "Postulantes" which primary key is "postulante_id", this table is the certificates, here in Postulantes there is a field called "postulante_postulacion_fecha" which is the date of the asignation/opening of the certificate, now what I wish to do is to get the registers that are duplicated by member document in PostulantMembers (this is what the query at the begining of question is doing) but also have the same date (postulante_postulacion_fecha) in Postulantes (this second condition is what im struggling to achieve), because I want to see if apart from being duplicated because he opened or got asigned 2 certificates, it got done on the same date, to check if this is possible to happen or has happened before.

What I have tried is:

SELECT
    pm.miembro_documento, p.postulante_postulacion_fecha, 
    COUNT(*)  
-- tables we need
FROM 
    PostulantesMiembros pm, Postulantes p
-- the relationship between the tables, I'm not sure if this AND is doing something (Postulantes does not have member_document field)
WHERE 
    pm.miembro_postulante = p.postulante_id 
    AND pm.miembro_documento = pm.miembro_documento
    -- I though that by doing p.postulante_postulacion_fecha = p.postulante_postulacion_fecha I would get registers that also had same date but not working
    AND p.postulante_postulacion_fecha = p.postulante_postulacion_fecha
GROUP BY 
    pm.miembro_documento, p.postulante_postulacion_fecha 
HAVING  
    COUNT(*) > 1;

An expected result would be:

miembro_documento postulante_postulacion_fecha count
71850071 2007-04-27 00:00:00.000 2

Here I could see the member document that is reapeated and the date of both certifications (this date is the same in both certifications)

So for this expected result to happen, I think that the data in Postulantes table should be like this for example:

postulante_id postulante_postulacion_fecha
8283 2007-04-27 00:00:00.000
9042 2007-04-27 00:00:00.000

And on PostulantesMiembros table, it should look like this:

miembro_id miembro_nombres miembro_documento miembro_postulante
27224 Roberto 71850071 8283
29658 Roberto 71850071 9042

I would appreciate alot any help, please tell me if you need any more context or have any questions

EDIT (I will try to explain what is wrong with the current query, it is bringing valid results (like the desired one above this EDIT), but also not valid results like the ones im going to explain):

The query is bringing results like this:

miembro_documento postulante_postulacion_fecha count
71290588 2002-07-30 00:00:00.000 2

But when I check how this data looks in PostulantesMiembros (Serch by miembro_documento 71290588):

miembro_id miembro_nombres miembro_documento miembro_postulante
27224 David 71850071 4355
29658 Lilibeth 71850071 4355

And in Postulantes it looks like this:

postulante_id postulante_postulacion_fecha
4355 2002-07-30 00:00:00.000

As you can see I have David and Lilibeth, yes both with the same identification number (miembro_documento), and thats not the problem with the query (you could also imagine that both registers have the name David if you want), but the problem is that the I need it to only show results if the miembro_postulante column is different, like lets suppose David has 4355, then Lilibeth should have 4356, for me to then check at Postulantes and find out that postulante_postulacion_fecha is the same on both 4356 and 4355. If they both have the same miembro_postulante then this is not going to be useful... I hope I explained myself well, feel free to ask for more input.

What is expected?? For it to be a valid result, then when searching by miembro_documento on PostulantesMiembros 71290588, I should see something like this:

miembro_id miembro_nombres miembro_documento miembro_postulante
27224 David 71850071 4355
29658 Lilibeth 71850071 4356

Notice miembro_postulante is different, and then when searching at Postulantes table with id 4355, and then with id 4356, I should find out that they both have the same postulante_postulacion_fecha (Date):

postulante_id postulante_postulacion_fecha
4355 2002-07-30 00:00:00.000
postulante_id postulante_postulacion_fecha
4356 2002-07-30 00:00:00.000

CodePudding user response:

Schema (MySQL v8.0)

CREATE TABLE PostulantesMiembros (
  `miembro_id` INTEGER,
  `miembro_nombres` VARCHAR(7),
  `miembro_documento` INTEGER,
  `miembro_postulante` INTEGER
);

INSERT INTO PostulantesMiembros
  (`miembro_id`, `miembro_nombres`, `miembro_documento`, `miembro_postulante`)
VALUES
  ('27224', 'Roberto', '71850071', '8283'),
  ('29658', 'Roberto', '71850071', '9042');

CREATE TABLE Postulantes (
  `postulante_id` INTEGER,
  `postulante_postulacion_fecha` DATETIME
);

INSERT INTO Postulantes
  (`postulante_id`, `postulante_postulacion_fecha`)
VALUES
  ('8283', '2007-04-27 00:00:00.000'),
  ('9042', '2007-04-27 00:00:00.000');

Query

SELECT miembro_documento
     , postulante_postulacion_fecha
     , COUNT(*)
  FROM PostulantesMiembros pm
  JOIN Postulantes p
    ON pm.miembro_postulante = p.postulante_id
 GROUP BY miembro_documento, postulante_postulacion_fecha
HAVING COUNT(*) > 1;
miembro_documento postulante_postulacion_fecha COUNT(*)
71850071 2007-04-27 00:00:00 2

View on DB Fiddle

CodePudding user response:

It appears you want a condition ensuring that the number of distinct miembro_postulante values per group is more than one.

After cleaning up your code by using modern join syntax, and removing redundant predicates, we can use COUNT(DISTINCT

SELECT
    pm.miembro_documento,
    p.postulante_postulacion_fecha, 
    COUNT(*)  
FROM 
    PostulantesMiembros pm
JOIN Postulantes p ON pm.miembro_postulante = p.postulante_id
WHERE pm.miembro_documento IS NOT NULL
GROUP BY 
    pm.miembro_documento, p.postulante_postulacion_fecha 
HAVING  
    COUNT(DISTINCT pm.miembro_postulante) > 1;

db<>fiddle

A possibly more efficient version compares the min and max.

SELECT
    pm.miembro_documento,
    p.postulante_postulacion_fecha, 
    COUNT(*)  
FROM 
    PostulantesMiembros pm
JOIN Postulantes p ON pm.miembro_postulante = p.postulante_id
WHERE pm.miembro_documento IS NOT NULL
GROUP BY 
    pm.miembro_documento, p.postulante_postulacion_fecha 
HAVING  
    MIN(pm.miembro_postulante) <> MAX(pm.miembro_postulante);
  • Related