Home > Software design >  Combine two SQL lite databases with Python
Combine two SQL lite databases with Python

Time:09-30

I have the following code in python to update db where the first column is "id" INTEGER PRIMARY KEY AUTOINCREMENT UNIQUE:

con = lite.connect('test_score.db')
with con:
    cur = con.cursor()
    cur.execute("INSERT INTO scores VALUES (NULL,?,?,?)", (first,last,score))
    item = cur.fetchone()
    on.commit()
    cur.close()
    con.close()

I get table "scores" with following data:

1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88

Two different users (userA and userB) copy test_score.db and code to their computer and use it separately. I get back two db test_score.db but now with different content:

user A test_score.db :

1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
4,Jim,Green,91
5,Tom,Hanks,15

user A test_score.db :

1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
4,Chris,Prat,99
5,Tom,Hanks,09
6,Tom,Hanks,15

I was trying to use

insert into AuditRecords select * from toMerge.AuditRecords; 

to combine two db into one but failed as the first column is a unique id. Two db have now the same ids but with different or the same data and merging is failing. I would like to find unique rows in both db (all values different ignoring id) and merge results to one full db.

Result should be something like this:

1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
4,Jim,Green,91
5,Tom,Hanks,15
6,Chris,Prat,99
7,Tom,Hanks,09

I can extract each value one by one and compare but want to avoid it as I might have longer rows in the future with more columns.

Sorry if it is obvious and easy questions, I'm still learning. I tried to find the answer but failed, please point me to answer if it already exists somewhere else. Thank you very much for your help.

CodePudding user response:

You need to define the approach to resolve duplicated rows. Will consider the max score? The min? The first one?

Considering the table AuditRecords has all the lines of both User A and B, you can use GROUP BY to deduplicate rows and use an aggregation function to resolve the score:

insert into
    AuditRecords
select
    id,
    first_name,
    last_name,
    max(score) as score
from
    toMerge.AuditRecords
group by
    id,
    first_name,
    last_name;

CodePudding user response:

For this requirement you should have defined a UNIQUE constraint for the combination of the columns first, last and score:

CREATE TABLE AuditRecords(
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  first TEXT,
  last TEXT,
  score INTEGER,
  UNIQUE(first, last, score)
);

Now you can use INSERT OR IGNORE to merge the tables:

INSERT OR IGNORE INTO AuditRecords(first, last, score) 
SELECT first, last, score 
FROM toMerge.AuditRecords;

Note that you must explicitly define the list of the columns that will receive the values and in this list the id is missing because its value will be autoincremented by each insertion.

Another way to do it without defining the UNIQUE constraint is to use EXCEPT:

INSERT INTO AuditRecords(first, last, score) 
SELECT first, last, score FROM toMerge.AuditRecords
EXCEPT
SELECT first, last, score FROM AuditRecords
  • Related