Union 5 or more Tables with the same structure and primary key-CodePudding

SQL 5.7.37

I'm using this code to merge 2 sql tables into a third table. But now I need to do the same thing, with 5 or more tables.

create table table3 as
    select *
    from table1
    union all
    select *
    from table2
    where not exists (select 1 from table1 where table1.title = table2.title);

How do I add more tables to this command?

Here's an example set of data and the desired result:

Notice the tables have some rows that are exactly the same as other tables. I'm not sure if that matters.

Structure: ID (primary key, auto increment), Title (unique,index), DESC, URL

Ex. Table 1
ID, Title, DESC, URL
 1 Bob thisisbob bob.com
 2 Tom thisistom tom.com
 3 Chad thisischad chad.com

Ex. Table 2
ID, Title, DESC, URL
 1 Chris thisischris chris.com
 2 Chad thisischad chad.com
 3 Dough thisisdough doug.com

Ex. Table 3
ID, Title, DESC, URL
 1 Morgan thisismorgan morgan.com
 2 Jerome thisisjerome jerome.com
 3 Mike thisismike mike.com

Ex. Table 4
ID, Title, DESC, URL
 1 Chris thisischris chris.com
 2 Chad thisischad chad.com
 3 Luke thisisluke luke.com

Result:
What I need in Table 5
ID, Title, DESC, URL
 1 Bob thisisbob bob.com
 2 Tom thisistom tom.com
 3 Chad thisischad chad.com
 4 Chris thisischris chris.com
 5 Dough thisisdough doug.com
 6 Morgan thisismorgan morgan.com
 7 Jerome thisisjerome jerome.com
 8 Mike thisismike mike.com
 9 Luke thisisluke luke.com

How can I add more tables to my union sql command?

CodePudding user response：

An idea could be to postpone the generation of every unique value only after you've aggregated all your tables, as long as UNION_ALL is way faster than UNION, and you would do the DISTINCT operation once instead of five times in this way.

SET @cnt = 0;

SELECT 
    (@cnt := @cnt   1) AS rowNumber,
    distinct_aggr_tables.* 
FROM 
    ( 
    SELECT DISTINCT * FROM (
        SELECT `Title`, `DESC`, `url` FROM Table1
        UNION ALL
        SELECT `Title`, `DESC`, `url` FROM Table2
        UNION ALL
        SELECT `Title`, `DESC`, `url` FROM Table3
        UNION ALL
        SELECT `Title`, `DESC`, `url` FROM Table4
        ) aggr_tables
    ) distinct_aggr_tables

Here's the corresponding fiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=ea1fa1f668e769cc5b1824dcfb9feb40.

Can this solution work for you?

EDIT: How to select all fields except one from a MySQL table for this task

There are two interesting ways of doing this:

1) The first approach copies each table into different temporary tables, then using the ALTER statement to drop the column we're not interested in, hence using these tables with the first version of this code.

    # for each table
    CREATE TEMPORARY TABLE temp_Table1 AS
    SELECT * FROM Table1;

    ALTER TABLE temp_Table1
    DROP Id;

2) The second approach uses a prepared statement, which allows you to build the query as a string. This can help for this exercise because we may want to, retrieve all column names from INFORMATION_SCHEMA.COLUMNS table within a query, and then remove the field name we're not interested in, hence pass the list of column names to the original query.

    SET @sql = CONCAT(
        'CREATE OR REPLACE VIEW AllTables AS ', 
        'SELECT
            ROW_NUMBER() OVER(ORDER BY Title ASC) AS rowNumber,
            distinct_aggr_tables.* 
         FROM 
            (',
            'SELECT DISTINCT 
                * 
            FROM 
                (
                SELECT ',
                    (SELECT CONCAT(REPLACE(GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'), 'Id`,', ''), '` ')
                     FROM INFORMATION_SCHEMA.COLUMNS cols
                     WHERE cols.TABLE_NAME = 'Table1' AND cols.TABLE_SCHEMA = 'test'),
                'FROM 
                    Table1
                UNION ALL
                SELECT ',
                    (SELECT CONCAT(REPLACE(GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'), 'Id`,', ''), '` ')
                     FROM INFORMATION_SCHEMA.COLUMNS cols
                     WHERE cols.TABLE_NAME = 'Table2' AND cols.TABLE_SCHEMA = 'test'),
                'FROM 
                    Table2
                UNION ALL
                SELECT ',
                    (SELECT CONCAT(REPLACE(GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'), 'Id`,', ''), '` ')
                     FROM INFORMATION_SCHEMA.COLUMNS cols
                     WHERE cols.TABLE_NAME = 'Table3' AND cols.TABLE_SCHEMA = 'test'),
                'FROM 
                    Table3
                UNION ALL
                SELECT ',
                    (SELECT CONCAT(REPLACE(GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'), 'Id`,', ''), '` ')
                     FROM INFORMATION_SCHEMA.COLUMNS cols
                     WHERE cols.TABLE_NAME = 'Table4' AND cols.TABLE_SCHEMA = 'test'),
                'FROM 
                    Table4

            ) aggr_tables
        ) distinct_aggr_tables;'
    );
                 
    PREPARE stmt FROM @sql;
    EXECUTE stmt;

    SELECT * FROM AllTables;

Note that this code reproduces exactly the first code of this post, except for the fact that uses a ROW_NUMBER window function instead of a global variable that updates itself.

This solution makes some assumptions, according to which it should be carefully quick-fixed:

the tables are exactly 4: in order to change this amount, it's necessary to replicate the following code in the right place for each new table:

   SELECT ',
      (SELECT CONCAT(REPLACE(GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'), 'Id`,', ''), '` ')
       FROM INFORMATION_SCHEMA.COLUMNS cols
       WHERE cols.TABLE_NAME = <new_table_name> AND cols.TABLE_SCHEMA = 'test'),
  'FROM 
      <new_table_name>

the current table names are Table1, Table2, Table3, Table4 and the database name is test: these references should be replaced when we're looking for the field names of a specific table (filtering by table name and database name):
```
 SELECT '
     (SELECT CONCAT ...
      FROM ...
      WHERE cols.TABLE_NAME = <table_name> AND cols.TABLE_SCHEMA = <db_name>),
'FROM 
     <table_name>
```

the field name to remove is 'Id' and it is found as first column of all the tables: if the name is different, it's necessary to change its name during the removal of this column. Moreover if this is not the first column, some tweaks are needed here:

  # COLUMN_NAME: 
  # ['Id', 'Title', 'DESC', 'url']
  #
  # GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'):
  # 'Id`,`Title`,`DESC`,`url'
  # 
  # REPLACE(GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'), 'Id`,', '')
  # '`Title`,`DESC`,`url'
  # 
  # CONCAT(REPLACE(GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'), 'Id`,', ''), '` ')
  # '`Title`,`DESC`,`url`'

(backticks are added to avoid exceptions due to DESC)

Note1: The generation of the four SELECT groups for each table may be automated (a simple example at the bottom of this page) by cycling with a variable on the table names contained in INFORMATION_SCHEMA.TABLES. Yet I wouldn't venture forth that path as it becomes difficult to deal with the string text to be evaluated with the prepared statement and the CONCAT of a calculated value from another table (INFORMATION_SCHEMA.COLUMNS cols).

Note2: Couldn't see the effects of this code within sql fiddles because wasn't able to access the INFORMATION_SCHEMA db tables. The code has been tested offline on a MySQL 8.0 database.

The first approach can be quite memory expensive, while the second one may be more efficient if handled carefully in the fixes to tailor your db.

Overall no perfect solution, though some that may solve your problem.

ps: any suggested edits to improve this post are more than welcome.