I'm trying to run
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t.Barcode, t.Country_Code ) AS seqnum_c
FROM t
in BigQuery which shows the approprite result. But the problem is when I want to create a table with the same order it's become a mess and order would not considered.
CREATE OR REPLACE TABLE `test_2` AS
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t.Barcode, t.Country_Code ) AS seqnum_c
FROM t
IN Addition I tried:
CREATE OR REPLACE TABLE `test_2` AS
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t.Barcode, t.Country_Code ORDER BY t.Barcode, t.Country_Code) AS seqnum_c
FROM t
And got the same result. Have you ever faced the same issue?
CodePudding user response:
You need to specify how you want the rows within the partition to be ordered in order for it to be deterministic.
It looks like you attempted to do this in your second example, but you did ORDER BY t.Barcode, t.Country_Code
which are exactly your partition columns. That means that within each partition, each row will already have exactly the same barcode
and country_code
so effectively, there is no ordering happening.
For example, given the following rows
Barcode Country_Code Timestamp
111 USA 12345
111 USA 12346
111 JP 12350
You are partitioning by Barcode
and Country_code
so the first two rows will be a part of the same partition. However, since you don't specify an order, you cannot know which row will get which row number. In the example above, it would make sense to ORDER BY Timestamp
, but without knowing your data or your goals it's hard to say what the right logic is for you.
In short, you need to specify an ORDER BY
column that is not a part of the PARTITION BY
columns in order to deterministically order the rows within each partition.
CodePudding user response:
Thanks @ken for your response. I guess I found my answer which is:
CREATE OR REPLACE TABLE t
AS (
SELECT t.*,
ROW_NUMBER() over (partition by t.Barcode, t.Country_Code order by Barcode, Country_Code ) as seqnum_c
FROM t)
ORDER BY Barcode,Country_Code,seqnum_c);
Best