Home > OS >  Compare of imported databasea - (Fingerprinting possible?)
Compare of imported databasea - (Fingerprinting possible?)

Time:10-20

Source: MS Access on Windows network share

Target: MySQL/MariaDB on Ubuntu

Tools: mdb-export, mysqlimport

record count: 1,5 Mio

I wonder if there is a fast and reliable way of comparing the imported data records.

Is there an SQL standard equivalent to e.g. md5 fingerprint hashes of files? Right now, I am building different import routines and I only want to fast check for similarity and (if failed) search for the detailed differences later on.

CodePudding user response:

A somewhat of a quick-and-dirty approach for individual columns can be implemented using stored aggregate functions which should be SQL standard.

This is how you'd do it with MariaDB:

CREATE AGGREGATE FUNCTION IF NOT EXISTS my_checksum(x TEXT) RETURNS CHAR(40)
DETERMINISTIC
BEGIN
 DECLARE cksum CHAR(40) DEFAULT SHA1('');
 DECLARE CONTINUE HANDLER FOR NOT FOUND
 RETURN cksum;
      LOOP
          FETCH GROUP NEXT ROW;
          SET cksum = SHA1(CONCAT(cksum, x));
      END LOOP;
END

You can then calculate a checksum from of a column as such:

MariaDB [test]> create or replace table t1(data varchar(20));
Query OK, 0 rows affected (0.063 sec)

MariaDB [test]> create or replace table t2(data varchar(20));
Query OK, 0 rows affected (0.064 sec)

MariaDB [test]> insert into t1 values ('hello'), ('world'), ('!');
Query OK, 3 rows affected (0.011 sec)
Records: 3  Duplicates: 0  Warnings: 0

MariaDB [test]> insert into t2 values ('Hello'), ('World'), ('!');
Query OK, 3 rows affected (0.015 sec)
Records: 3  Duplicates: 0  Warnings: 0

MariaDB [test]> select my_checksum(data) from t1;
 ------------------------------------------ 
| my_checksum(data)                        |
 ------------------------------------------ 
| 7f6fb9a61c2097f70a36254c332c47364c496e07 |
 ------------------------------------------ 
1 row in set (0.001 sec)

MariaDB [test]> select my_checksum(data) from t2;
 ------------------------------------------ 
| my_checksum(data)                        |
 ------------------------------------------ 
| 5f683ea3674e33ce24bff5f68f53509566ad4da2 |
 ------------------------------------------ 
1 row in set (0.001 sec)

MariaDB [test]> delete from t2;
Query OK, 3 rows affected (0.011 sec)

MariaDB [test]> insert into t2 values ('hello'), ('world'), ('!');
Query OK, 3 rows affected (0.012 sec)
Records: 3  Duplicates: 0  Warnings: 0

MariaDB [test]> select my_checksum(data) from t2;
 ------------------------------------------ 
| my_checksum(data)                        |
 ------------------------------------------ 
| 7f6fb9a61c2097f70a36254c332c47364c496e07 |
 ------------------------------------------ 
1 row in set (0.001 sec)

This of course relies on the SHA1 of the column being the same on all databases. Conversions into strings should make it mostly compatible but there might be differences in how these are implemented in different databases.

CodePudding user response:

The percona toolkit has the tool you need.

https://docs.percona.com/percona-toolkit/

See pt-table-checksum and pt-table-sync

CodePudding user response:

I found it.

It's very simple and very fast.

CHECKSUM TABLE tbl_name

Will give you a number value to compare.

And it's Transact-SQL so will hopefully work the same on MS Access, MySQL and MariaDB

  • Related