Home > other >  How to ensure that the duplicates are deleted in a table before joining tables in SQLite
How to ensure that the duplicates are deleted in a table before joining tables in SQLite

Time:08-12

In SQLite, I am trying to combine 2 tables to create a new table. I checked for duplicate rows in Table 1 and Table 2. Table 1 had no duplicate rows, while Table 2 had duplicate rows. I deleted the duplicate rows in Table 2 and renamed it to newTable2. When I am combining Table 1 and newTable2 to create Table3, I checked for duplicates. Yet, after joining the tables, it showed that there are duplicates. What other code do I need to add to ensure that the duplicates are deleted before creating Table3? I hope my question is clear as I am new here.

/*check duplicates*/
SELECT *, COUNT(*) AS Count
FROM Table1
GROUP BY PTID, DIAG_DATE, DIAG_TIME
HAVING COUNT(*)>1
;
/*no duplicates shown*/

SELECT *, COUNT(*) AS Count
FROM Table2
GROUP BY PTID, RESULT_DATE, RESULT_TIME, TEST_RESULT
HAVING COUNT(*)>1
;
/*duplicates shown*/
delete from Table2
where rowid not in (select min(rowid)
from lab
group by PTID, RESULT_DATE, RESULT_TIME, TEST_RESULT)
;
ALTER TABLE Table2
RENAME TO newTable2
;

CREATE TABLE Table3 AS
SELECT *
FROM Table1 JOIN newTable2 ON Table1.PTID=newTable2.PTID

CodePudding user response:

In first select you're searching for distinct (GROUP BY) rows but distinct according to THREE columns (PTID, DIAG_DATE, DIAG_TIME). In second query again distinct but according to FOUR columns (PTID, RESULT_DATE, RESULT_TIME, TEST_RESULT). But then you want to JOIN two tables according to ONLY ONE column PTID.

If table has distinct records according to PTID, DIAG_DATE, DIAG_TIME does not mean that it is distinct according to PTID column only!

You must have distinct record according to only one column if you want to join them according to that one column or you have to create complex JOIN on more columns:

SELECT t1.*,t2.* 
FROM Table1 t1
INNER JOIN Table2 t2
    ON t1.PTID = t2.PTID
       AND t1.DIAG_DATE = t2.DIAG_DATE 
       AND t1.RESULT_TIME = t2.RESULT_TIME
  • Related