Create BigQuery table from existing table, including

I want to create a new table from an existing one, and add one column. But, and this seems to make this tricky, I want it to be partitioned by _PARTITIONTIME.

I know I can create a table from an existing table, like so:

CREATE OR REPLACE TABLE `mydataset.mytable_new`
AS SELECT * FROM `mydataset.mytable` 
--JOIN the new column here
LIMIT 0

I also know that I can create a partitioned table, like so:

CREATE OR REPLACE TABLE `mydataset.mytable_new`
(
  date DATE,
  var1 STRING,
  var2 INT64,
  --add new column here
)
PARTITION BY DATE(_PARTITIONTIME);

But: How can I combine the two? I tried this:

CREATE OR REPLACE TABLE `mydataset.mytable_new`
PARTITION BY DATE(_PARTITIONTIME)
AS SELECT * FROM `mydataset.mytable`
-- JOIN new column here
LIMIT 0

However, this gives me the error 'Unrecognized name: _PARTITIONTIME'.

Any hints are greatly appreciated!

CodePudding user response：

You may try like :

CREATE TABLE
  mydataset.newtable (transaction_id INT64, transaction_date DATE)
PARTITION BY
  transaction_date
AS SELECT transaction_id, transaction_date FROM mydataset.mytable

From doc : cloud.google.com/bigquery/docs/creating-partitioned-tables#sql

CodePudding user response：

This is a documented limitation of the partitioning expression of the CREATE TABLE syntax:

https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#partition_expression

_PARTITIONDATE. Partition by ingestion time with daily partitions. This syntax cannot be used with the AS query_statement clause.

I believe you should be able to split the work. Use a statement to create the new table, then issue INSERT statement(s) to populate from the original table.

However, if you're already dealing with a sizable table, you may want to re-consider this partitioning scheme. By default, all the data from the original table would land in a single partition (the current date).