Home > Software design >  How can I make a union of temporary tables with JOIN statement?
How can I make a union of temporary tables with JOIN statement?

Time:11-15

Through "WITH" statement I have obtained three temporary tables: "october_fall10", "november_fall11" and "december_fall12" Now I want to perform an INNER JOIN (the Query below) with them through the variable "member_casual", but instead I receive the following alert Error message: Table name "november_fall11" missing dataset while no default dataset is set in the request. Regardless that I wanted to do it with "december_fall12" the alert is repeated, but in reference to "december_fall12"

--This is Q1 from where I've got the 3 tables --

WITH october_fall10 AS
(SELECT 
   start_station_name,
   end_station_name,
   start_station_id,
   end_station_id,
   EXTRACT (DATE FROM started_at) AS start_date,
   EXTRACT(DAYOFWEEK FROM started_at) AS start_week_date,
   EXTRACT (TIME FROM started_at) AS start_time,    
   EXTRACT (DATE FROM ended_at) AS end_date,
   EXTRACT(DAYOFWEEK FROM ended_at) AS end_week_date,    
   EXTRACT (TIME FROM ended_at) AS end_time,
   DATETIME_DIFF (ended_at,started_at, MINUTE) AS total_lenght,
   member_casual
FROM 
   `ciclystic.cyclistic_seasonal_analysis.fall_202010` AS fall_analysis
ORDER BY 
   started_at DESC)
SELECT
   member_casual,
   start_week_date,
   COUNT (member_casual) AS member_casual_start,
   TIME(
   EXTRACT(hour   FROM AVG(start_time - '0:0:0')), 
   EXTRACT(minute FROM AVG(start_time - '0:0:0')), 
   EXTRACT(second FROM AVG(start_time - '0:0:0'))
   ) AS avg_start_time
FROM 
   october_fall10
GROUP BY
   start_week_date,
   member_casual
ORDER BY 
   start_week_date DESC;  
WITH november_fall11 AS
(SELECT
   start_station_name,
   end_station_name,
   start_station_id,
   end_station_id,
   EXTRACT (DATE FROM started_at) AS start_date,
   EXTRACT(DAYOFWEEK FROM started_at) AS start_week_date,
   EXTRACT (TIME FROM started_at) AS start_time,    
   EXTRACT (DATE FROM ended_at) AS end_date,
   EXTRACT(DAYOFWEEK FROM ended_at) AS end_week_date,    
   EXTRACT (TIME FROM ended_at) AS end_time,
   DATETIME_DIFF (ended_at,started_at, MINUTE) AS total_lenght,
   member_casual
FROM
   `ciclystic.cyclistic_seasonal_analysis.fall_202011` AS fall_analysis11 
ORDER BY 
   started_at DESC)
SELECT
   member_casual,
   start_week_date,
   COUNT (member_casual) AS member_casual_start,
   TIME(
   EXTRACT(hour   FROM AVG(start_time - '0:0:0')), 
   EXTRACT(minute FROM AVG(start_time - '0:0:0')), 
   EXTRACT(second FROM AVG(start_time - '0:0:0'))
   ) AS avg_start_time
FROM 
   november_fall11
GROUP BY
   start_week_date,
   member_casual
ORDER BY 
   start_week_date DESC;
WITH december_fall12 AS
(SELECT
   start_station_name,
   end_station_name,
   start_station_id,
   end_station_id,
   EXTRACT (DATE FROM started_at) AS start_date,
   EXTRACT(DAYOFWEEK FROM started_at) AS start_week_date,
   EXTRACT (TIME FROM started_at) AS start_time,    
   EXTRACT (DATE FROM ended_at) AS end_date,
   EXTRACT(DAYOFWEEK FROM ended_at) AS end_week_date,    
   EXTRACT (TIME FROM ended_at) AS end_time,
   DATETIME_DIFF (ended_at,started_at, MINUTE) AS total_lenght,
   member_casual
FROM
   `ciclystic.cyclistic_seasonal_analysis.fall_202012` AS fall_analysis11 
ORDER BY 
   started_at DESC)
SELECT
   member_casual,
   start_week_date,
   COUNT (member_casual) AS member_casual_start,
   TIME(
   EXTRACT(hour   FROM AVG(start_time - '0:0:0')), 
   EXTRACT(minute FROM AVG(start_time - '0:0:0')), 
   EXTRACT(second FROM AVG(start_time - '0:0:0'))
   ) AS avg_start_time
FROM 
   december_fall12
GROUP BY
   start_week_date,
   member_casual
ORDER BY 
   start_week_date DESC; 

--This is Q2 from where I want to get the full combination--

SELECT
       october_fall10.member_casual,
       october_fall10.start_week_date,
       october_fall10.member_casual_start,
       october_fall10.avg_start_time,
       november_fall11.member_casual,
       november_fall11.start_week_date,
       november_fall11.member_casual_start,
       november_fall11.avg_start_time,
       december_fall12.member_casual,
       december_fall12.start_week_date,
       december_fall12.member_casual_start,
       december_fall12.avg_start_time
   FROM
       ( october_fall10 JOIN (november_fall11 JOIN december_fall12 USING (member_casual)) 
       USING 
       (member_casual) )

CodePudding user response:

First, when you create a 'table' using WITH, it's not a temporary table, it's a Common Table Expression (CTE). It's useful to understand the different terminology, as they behave in different ways (much like confusing petrol and diesel doesn't go so well).

One of the ways that they differ is that a CTE is no persisted in any way. Once you make your final query, the CTE goes out of scope and can not be referenced again (after the ;, the CTE is 'gone').

This means that the pattern you want is...

WITH
  october_fall10 AS
(
  SELECT 
    start_station_name,
    end_station_name,
    start_station_id,
    end_station_id,
    EXTRACT (DATE FROM started_at) AS start_date,
    EXTRACT(DAYOFWEEK FROM started_at) AS start_week_date,
    EXTRACT (TIME FROM started_at) AS start_time,    
    EXTRACT (DATE FROM ended_at) AS end_date,
    EXTRACT(DAYOFWEEK FROM ended_at) AS end_week_date,    
    EXTRACT (TIME FROM ended_at) AS end_time,
    DATETIME_DIFF (ended_at,started_at, MINUTE) AS total_lenght,
    member_casual
  FROM 
    `ciclystic.cyclistic_seasonal_analysis.fall_202010` AS fall_analysis
),
  october_fall10_aggregate AS
(
  SELECT
     member_casual,
     start_week_date,
     COUNT(member_casual) AS member_casual_start,
     TIME(
       EXTRACT(hour   FROM AVG(start_time - '0:0:0')), 
       EXTRACT(minute FROM AVG(start_time - '0:0:0')), 
       EXTRACT(second FROM AVG(start_time - '0:0:0'))
     )
       AS avg_start_time
  FROM 
     october_fall10
  GROUP BY
     start_week_date,
     member_casual
),
  november_fall11 AS
(
  <your query here>
),
  november_fall11_aggregate AS
(
  <your query here>
),
  december_fall12 AS
(
  <your query here>
),
  december_fall12_aggregate AS
(
  <your query here>
),
SELECT
  october_fall10_aggregate.member_casual,
  october_fall10_aggregate.start_week_date,
  october_fall10_aggregate.member_casual_start,
  october_fall10_aggregate.avg_start_time,
  november_fall11_aggregate.member_casual,
  november_fall11_aggregate.start_week_date,
  november_fall11_aggregate.member_casual_start,
  november_fall11_aggregate.avg_start_time,
  december_fall12_aggregate.member_casual,
  december_fall12_aggregate.start_week_date,
  december_fall12_aggregate.member_casual_start,
  december_fall12_aggregate.avg_start_time
FROM
  october_fall10_aggregate
JOIN
  november_fall11_aggregate
    USING (member_casual)
JOIN
  december_fall12_aggregate
    USING (member_casual)

This creates new CTEs with a _aggregate suffix, and then references those in your final query.

  • Note that I also removed the ORDER BY from each CTE
  • This is because CTEs (and table, temporary or otherwise) do not retain that ordering

Also, repeating all that code is insane. You're much better off with all the data in a single table and writing the code once. Failing that you can union the tables 'as-if' they're one table...

WITH
  unioned AS
(
  SELECT 202010 AS source_tbl, * FROM `ciclystic.cyclistic_seasonal_analysis.fall_202010`
  UNION ALL
  SELECT 202011 AS source_tbl, * FROM `ciclystic.cyclistic_seasonal_analysis.fall_202011`
  UNION ALL
  SELECT 202012 AS source_tbl, * FROM `ciclystic.cyclistic_seasonal_analysis.fall_202012`
),
  extraction AS
(
  SELECT 
    source_tbl,
    member_casual,
    start_station_name,
    end_station_name,
    start_station_id,
    end_station_id,
    EXTRACT (DATE FROM started_at) AS start_date,
    EXTRACT(DAYOFWEEK FROM started_at) AS start_week_date,
    EXTRACT (TIME FROM started_at) AS start_time,    
    EXTRACT (DATE FROM ended_at) AS end_date,
    EXTRACT(DAYOFWEEK FROM ended_at) AS end_week_date,    
    EXTRACT (TIME FROM ended_at) AS end_time,
    DATETIME_DIFF (ended_at,started_at, MINUTE) AS total_length
  FROM 
    unioned
)
SELECT
  member_casual,
  source_tbl,
  start_week_date,
  COUNT(member_casual) AS member_casual_start,
  TIME(
    EXTRACT(hour   FROM AVG(start_time - '0:0:0')), 
    EXTRACT(minute FROM AVG(start_time - '0:0:0')), 
    EXTRACT(second FROM AVG(start_time - '0:0:0'))
  )
    AS avg_start_time
FROM 
  extraction
GROUP BY
  member_casual,
  source_tbl,
  start_week_date
ORDER BY
  member_casual,
  source_tbl,
  start_week_date

This is NOT the same format as you had before, but is much more in keeping with the intended patterns for SQL.

  • Related