SQL 5.7.37
I'm using this code to merge 2 sql tables into a third table. But now I need to do the same thing, with 5 or more tables.
create table table3 as
select *
from table1
union all
select *
from table2
where not exists (select 1 from table1 where table1.title = table2.title);
How do I add more tables to this command?
Here's an example set of data and the desired result:
Notice the tables have some rows that are exactly the same as other tables. I'm not sure if that matters.
Structure: ID (primary key, auto increment), Title (unique,index), DESC, URL
Ex. Table 1
ID, Title, DESC, URL
1 Bob thisisbob bob.com
2 Tom thisistom tom.com
3 Chad thisischad chad.com
Ex. Table 2
ID, Title, DESC, URL
1 Chris thisischris chris.com
2 Chad thisischad chad.com
3 Dough thisisdough doug.com
Ex. Table 3
ID, Title, DESC, URL
1 Morgan thisismorgan morgan.com
2 Jerome thisisjerome jerome.com
3 Mike thisismike mike.com
Ex. Table 4
ID, Title, DESC, URL
1 Chris thisischris chris.com
2 Chad thisischad chad.com
3 Luke thisisluke luke.com
Result:
What I need in Table 5
ID, Title, DESC, URL
1 Bob thisisbob bob.com
2 Tom thisistom tom.com
3 Chad thisischad chad.com
4 Chris thisischris chris.com
5 Dough thisisdough doug.com
6 Morgan thisismorgan morgan.com
7 Jerome thisisjerome jerome.com
8 Mike thisismike mike.com
9 Luke thisisluke luke.com
How can I add more tables to my union sql command?
CodePudding user response:
An idea could be to postpone the generation of every unique value only after you've aggregated all your tables, as long as UNION_ALL
is way faster than UNION
, and you would do the DISTINCT
operation once instead of five times in this way.
SET @cnt = 0;
SELECT
(@cnt := @cnt 1) AS rowNumber,
distinct_aggr_tables.*
FROM
(
SELECT DISTINCT * FROM (
SELECT `Title`, `DESC`, `url` FROM Table1
UNION ALL
SELECT `Title`, `DESC`, `url` FROM Table2
UNION ALL
SELECT `Title`, `DESC`, `url` FROM Table3
UNION ALL
SELECT `Title`, `DESC`, `url` FROM Table4
) aggr_tables
) distinct_aggr_tables
Here's the corresponding fiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=ea1fa1f668e769cc5b1824dcfb9feb40.
Can this solution work for you?
EDIT: How to select all fields except one from a MySQL table for this task
There are two interesting ways of doing this:
1) The first approach copies each table into different temporary tables, then using the ALTER
statement to drop the column we're not interested in, hence using these tables with the first version of this code.
# for each table
CREATE TEMPORARY TABLE temp_Table1 AS
SELECT * FROM Table1;
ALTER TABLE temp_Table1
DROP Id;
2) The second approach uses a prepared statement, which allows you to build the query as a string. This can help for this exercise because we may want to, retrieve all column names from INFORMATION_SCHEMA.COLUMNS
table within a query, and then remove the field name we're not interested in, hence pass the list of column names to the original query.
SET @sql = CONCAT(
'CREATE OR REPLACE VIEW AllTables AS ',
'SELECT
ROW_NUMBER() OVER(ORDER BY Title ASC) AS rowNumber,
distinct_aggr_tables.*
FROM
(',
'SELECT DISTINCT
*
FROM
(
SELECT ',
(SELECT CONCAT(REPLACE(GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'), 'Id`,', ''), '` ')
FROM INFORMATION_SCHEMA.COLUMNS cols
WHERE cols.TABLE_NAME = 'Table1' AND cols.TABLE_SCHEMA = 'test'),
'FROM
Table1
UNION ALL
SELECT ',
(SELECT CONCAT(REPLACE(GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'), 'Id`,', ''), '` ')
FROM INFORMATION_SCHEMA.COLUMNS cols
WHERE cols.TABLE_NAME = 'Table2' AND cols.TABLE_SCHEMA = 'test'),
'FROM
Table2
UNION ALL
SELECT ',
(SELECT CONCAT(REPLACE(GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'), 'Id`,', ''), '` ')
FROM INFORMATION_SCHEMA.COLUMNS cols
WHERE cols.TABLE_NAME = 'Table3' AND cols.TABLE_SCHEMA = 'test'),
'FROM
Table3
UNION ALL
SELECT ',
(SELECT CONCAT(REPLACE(GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'), 'Id`,', ''), '` ')
FROM INFORMATION_SCHEMA.COLUMNS cols
WHERE cols.TABLE_NAME = 'Table4' AND cols.TABLE_SCHEMA = 'test'),
'FROM
Table4
) aggr_tables
) distinct_aggr_tables;'
);
PREPARE stmt FROM @sql;
EXECUTE stmt;
SELECT * FROM AllTables;
Note that this code reproduces exactly the first code of this post, except for the fact that uses a ROW_NUMBER
window function instead of a global variable that updates itself.
This solution makes some assumptions, according to which it should be carefully quick-fixed:
the tables are exactly 4: in order to change this amount, it's necessary to replicate the following code in the right place for each new table:
SELECT ', (SELECT CONCAT(REPLACE(GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'), 'Id`,', ''), '` ') FROM INFORMATION_SCHEMA.COLUMNS cols WHERE cols.TABLE_NAME = <new_table_name> AND cols.TABLE_SCHEMA = 'test'), 'FROM <new_table_name>
the current table names are
Table1
,Table2
,Table3
,Table4
and the database name istest
: these references should be replaced when we're looking for the field names of a specific table (filtering by table name and database name):SELECT ' (SELECT CONCAT ... FROM ... WHERE cols.TABLE_NAME = <table_name> AND cols.TABLE_SCHEMA = <db_name>), 'FROM <table_name>
the field name to remove is 'Id' and it is found as first column of all the tables: if the name is different, it's necessary to change its name during the removal of this column. Moreover if this is not the first column, some tweaks are needed here:
# COLUMN_NAME: # ['Id', 'Title', 'DESC', 'url'] # # GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'): # 'Id`,`Title`,`DESC`,`url' # # REPLACE(GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'), 'Id`,', '') # '`Title`,`DESC`,`url' # # CONCAT(REPLACE(GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'), 'Id`,', ''), '` ') # '`Title`,`DESC`,`url`'
(backticks are added to avoid exceptions due to DESC
)
Note1: The generation of the four SELECT groups for each table may be automated (a simple example at the bottom of this page) by cycling with a variable on the table names contained in INFORMATION_SCHEMA.TABLES
. Yet I wouldn't venture forth that path as it becomes difficult to deal with the string text to be evaluated with the prepared statement and the CONCAT of a calculated value from another table (INFORMATION_SCHEMA.COLUMNS cols
).
Note2: Couldn't see the effects of this code within sql fiddles because wasn't able to access the INFORMATION_SCHEMA
db tables. The code has been tested offline on a MySQL 8.0 database.
The first approach can be quite memory expensive, while the second one may be more efficient if handled carefully in the fixes to tailor your db.
Overall no perfect solution, though some that may solve your problem.
ps: any suggested edits to improve this post are more than welcome.