Home > Enterprise >  What's the purpose of parametrizing data types in Big Query?
What's the purpose of parametrizing data types in Big Query?

Time:12-21

In BigQuery, you can parametrize some data types (reference). For example:

DECLARE x STRING(10);
DECLARE y STRING;

These parameters are not propagated, but they are checked during insertion/assignment.

I am now pushing data from a PostgreSQL database to BigQuery. I have a PSQL schema where all strings have a set maximum length. These maximum lengths vary from approximately 1 (for enum-like values) to 1000 (for free text). I have to define a table schema in BigQuery to push the data to. I'm not interested in validation, as I know from my PSQL schema the exact limits on data that is coming into BQ. Hence my question:

For my use case, is there any advantage (performance, costs, ...) to using the parametrized data types over arbitrary length strings?

CodePudding user response:

If you need neither the enforcement nor the syntactical sugar, it's totally reasonable to skip using parameterization. There's an argument to be made for derived tables maintaining the same constraints as the originating data, but it's really down to your own usage patterns and idioms.

BigQuery data storage costs are based on the actual string sizes, and partitioning/clustering don't leverage parameterization details. They shouldn't factor into your decision.

  • Related