Home > Mobile >  Partitioning in view of BigQuery is not remaining when create table
Partitioning in view of BigQuery is not remaining when create table

Time:09-23

I'm trying to run

SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t.Barcode, t.Country_Code ) AS seqnum_c
FROM t

in BigQuery which shows the approprite result. But the problem is when I want to create a table with the same order it's become a mess and order would not considered.

CREATE OR REPLACE TABLE `test_2` AS
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t.Barcode, t.Country_Code ) AS seqnum_c
FROM t

IN Addition I tried:

CREATE OR REPLACE TABLE `test_2` AS
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t.Barcode, t.Country_Code ORDER BY t.Barcode, t.Country_Code) AS seqnum_c
FROM t

And got the same result. Have you ever faced the same issue?

CodePudding user response:

You need to specify how you want the rows within the partition to be ordered in order for it to be deterministic.

It looks like you attempted to do this in your second example, but you did ORDER BY t.Barcode, t.Country_Code which are exactly your partition columns. That means that within each partition, each row will already have exactly the same barcode and country_code so effectively, there is no ordering happening.

For example, given the following rows

Barcode  Country_Code  Timestamp
111      USA           12345
111      USA           12346
111      JP            12350

You are partitioning by Barcode and Country_code so the first two rows will be a part of the same partition. However, since you don't specify an order, you cannot know which row will get which row number. In the example above, it would make sense to ORDER BY Timestamp, but without knowing your data or your goals it's hard to say what the right logic is for you.

In short, you need to specify an ORDER BY column that is not a part of the PARTITION BY columns in order to deterministically order the rows within each partition.

CodePudding user response:

Thanks @ken for your response. I guess I found my answer which is:

CREATE OR REPLACE TABLE t 
 AS (
SELECT t.*,
ROW_NUMBER() over (partition by t.Barcode, t.Country_Code order by Barcode, Country_Code  ) as seqnum_c
FROM  t)
ORDER BY Barcode,Country_Code,seqnum_c);

Best

  • Related