Through "WITH" statement I have obtained three temporary tables: "october_fall10", "november_fall11" and "december_fall12" Now I want to perform an INNER JOIN (the Query below) with them through the variable "member_casual", but instead I receive the following alert Error message: Table name "november_fall11" missing dataset while no default dataset is set in the request. Regardless that I wanted to do it with "december_fall12" the alert is repeated, but in reference to "december_fall12"
--This is Q1 from where I've got the 3 tables --
WITH october_fall10 AS
(SELECT
start_station_name,
end_station_name,
start_station_id,
end_station_id,
EXTRACT (DATE FROM started_at) AS start_date,
EXTRACT(DAYOFWEEK FROM started_at) AS start_week_date,
EXTRACT (TIME FROM started_at) AS start_time,
EXTRACT (DATE FROM ended_at) AS end_date,
EXTRACT(DAYOFWEEK FROM ended_at) AS end_week_date,
EXTRACT (TIME FROM ended_at) AS end_time,
DATETIME_DIFF (ended_at,started_at, MINUTE) AS total_lenght,
member_casual
FROM
`ciclystic.cyclistic_seasonal_analysis.fall_202010` AS fall_analysis
ORDER BY
started_at DESC)
SELECT
member_casual,
start_week_date,
COUNT (member_casual) AS member_casual_start,
TIME(
EXTRACT(hour FROM AVG(start_time - '0:0:0')),
EXTRACT(minute FROM AVG(start_time - '0:0:0')),
EXTRACT(second FROM AVG(start_time - '0:0:0'))
) AS avg_start_time
FROM
october_fall10
GROUP BY
start_week_date,
member_casual
ORDER BY
start_week_date DESC;
WITH november_fall11 AS
(SELECT
start_station_name,
end_station_name,
start_station_id,
end_station_id,
EXTRACT (DATE FROM started_at) AS start_date,
EXTRACT(DAYOFWEEK FROM started_at) AS start_week_date,
EXTRACT (TIME FROM started_at) AS start_time,
EXTRACT (DATE FROM ended_at) AS end_date,
EXTRACT(DAYOFWEEK FROM ended_at) AS end_week_date,
EXTRACT (TIME FROM ended_at) AS end_time,
DATETIME_DIFF (ended_at,started_at, MINUTE) AS total_lenght,
member_casual
FROM
`ciclystic.cyclistic_seasonal_analysis.fall_202011` AS fall_analysis11
ORDER BY
started_at DESC)
SELECT
member_casual,
start_week_date,
COUNT (member_casual) AS member_casual_start,
TIME(
EXTRACT(hour FROM AVG(start_time - '0:0:0')),
EXTRACT(minute FROM AVG(start_time - '0:0:0')),
EXTRACT(second FROM AVG(start_time - '0:0:0'))
) AS avg_start_time
FROM
november_fall11
GROUP BY
start_week_date,
member_casual
ORDER BY
start_week_date DESC;
WITH december_fall12 AS
(SELECT
start_station_name,
end_station_name,
start_station_id,
end_station_id,
EXTRACT (DATE FROM started_at) AS start_date,
EXTRACT(DAYOFWEEK FROM started_at) AS start_week_date,
EXTRACT (TIME FROM started_at) AS start_time,
EXTRACT (DATE FROM ended_at) AS end_date,
EXTRACT(DAYOFWEEK FROM ended_at) AS end_week_date,
EXTRACT (TIME FROM ended_at) AS end_time,
DATETIME_DIFF (ended_at,started_at, MINUTE) AS total_lenght,
member_casual
FROM
`ciclystic.cyclistic_seasonal_analysis.fall_202012` AS fall_analysis11
ORDER BY
started_at DESC)
SELECT
member_casual,
start_week_date,
COUNT (member_casual) AS member_casual_start,
TIME(
EXTRACT(hour FROM AVG(start_time - '0:0:0')),
EXTRACT(minute FROM AVG(start_time - '0:0:0')),
EXTRACT(second FROM AVG(start_time - '0:0:0'))
) AS avg_start_time
FROM
december_fall12
GROUP BY
start_week_date,
member_casual
ORDER BY
start_week_date DESC;
--This is Q2 from where I want to get the full combination--
SELECT
october_fall10.member_casual,
october_fall10.start_week_date,
october_fall10.member_casual_start,
october_fall10.avg_start_time,
november_fall11.member_casual,
november_fall11.start_week_date,
november_fall11.member_casual_start,
november_fall11.avg_start_time,
december_fall12.member_casual,
december_fall12.start_week_date,
december_fall12.member_casual_start,
december_fall12.avg_start_time
FROM
( october_fall10 JOIN (november_fall11 JOIN december_fall12 USING (member_casual))
USING
(member_casual) )
CodePudding user response:
First, when you create a 'table' using WITH
, it's not a temporary table, it's a Common Table Expression (CTE). It's useful to understand the different terminology, as they behave in different ways (much like confusing petrol and diesel doesn't go so well).
One of the ways that they differ is that a CTE is no persisted in any way. Once you make your final query, the CTE goes out of scope and can not be referenced again (after the ;
, the CTE is 'gone').
This means that the pattern you want is...
WITH
october_fall10 AS
(
SELECT
start_station_name,
end_station_name,
start_station_id,
end_station_id,
EXTRACT (DATE FROM started_at) AS start_date,
EXTRACT(DAYOFWEEK FROM started_at) AS start_week_date,
EXTRACT (TIME FROM started_at) AS start_time,
EXTRACT (DATE FROM ended_at) AS end_date,
EXTRACT(DAYOFWEEK FROM ended_at) AS end_week_date,
EXTRACT (TIME FROM ended_at) AS end_time,
DATETIME_DIFF (ended_at,started_at, MINUTE) AS total_lenght,
member_casual
FROM
`ciclystic.cyclistic_seasonal_analysis.fall_202010` AS fall_analysis
),
october_fall10_aggregate AS
(
SELECT
member_casual,
start_week_date,
COUNT(member_casual) AS member_casual_start,
TIME(
EXTRACT(hour FROM AVG(start_time - '0:0:0')),
EXTRACT(minute FROM AVG(start_time - '0:0:0')),
EXTRACT(second FROM AVG(start_time - '0:0:0'))
)
AS avg_start_time
FROM
october_fall10
GROUP BY
start_week_date,
member_casual
),
november_fall11 AS
(
<your query here>
),
november_fall11_aggregate AS
(
<your query here>
),
december_fall12 AS
(
<your query here>
),
december_fall12_aggregate AS
(
<your query here>
),
SELECT
october_fall10_aggregate.member_casual,
october_fall10_aggregate.start_week_date,
october_fall10_aggregate.member_casual_start,
october_fall10_aggregate.avg_start_time,
november_fall11_aggregate.member_casual,
november_fall11_aggregate.start_week_date,
november_fall11_aggregate.member_casual_start,
november_fall11_aggregate.avg_start_time,
december_fall12_aggregate.member_casual,
december_fall12_aggregate.start_week_date,
december_fall12_aggregate.member_casual_start,
december_fall12_aggregate.avg_start_time
FROM
october_fall10_aggregate
JOIN
november_fall11_aggregate
USING (member_casual)
JOIN
december_fall12_aggregate
USING (member_casual)
This creates new CTEs with a _aggregate
suffix, and then references those in your final query.
- Note that I also removed the
ORDER BY
from each CTE - This is because CTEs (and table, temporary or otherwise) do not retain that ordering
Also, repeating all that code is insane. You're much better off with all the data in a single table and writing the code once. Failing that you can union the tables 'as-if' they're one table...
WITH
unioned AS
(
SELECT 202010 AS source_tbl, * FROM `ciclystic.cyclistic_seasonal_analysis.fall_202010`
UNION ALL
SELECT 202011 AS source_tbl, * FROM `ciclystic.cyclistic_seasonal_analysis.fall_202011`
UNION ALL
SELECT 202012 AS source_tbl, * FROM `ciclystic.cyclistic_seasonal_analysis.fall_202012`
),
extraction AS
(
SELECT
source_tbl,
member_casual,
start_station_name,
end_station_name,
start_station_id,
end_station_id,
EXTRACT (DATE FROM started_at) AS start_date,
EXTRACT(DAYOFWEEK FROM started_at) AS start_week_date,
EXTRACT (TIME FROM started_at) AS start_time,
EXTRACT (DATE FROM ended_at) AS end_date,
EXTRACT(DAYOFWEEK FROM ended_at) AS end_week_date,
EXTRACT (TIME FROM ended_at) AS end_time,
DATETIME_DIFF (ended_at,started_at, MINUTE) AS total_length
FROM
unioned
)
SELECT
member_casual,
source_tbl,
start_week_date,
COUNT(member_casual) AS member_casual_start,
TIME(
EXTRACT(hour FROM AVG(start_time - '0:0:0')),
EXTRACT(minute FROM AVG(start_time - '0:0:0')),
EXTRACT(second FROM AVG(start_time - '0:0:0'))
)
AS avg_start_time
FROM
extraction
GROUP BY
member_casual,
source_tbl,
start_week_date
ORDER BY
member_casual,
source_tbl,
start_week_date
This is NOT the same format as you had before, but is much more in keeping with the intended patterns for SQL.