I have a denormalised table and I want to simplify it using repeated fields inside of BigQuery. To illustrate what I am trying to do, I want to go from:
|CustomerNumber|InvoiceNumber|InvoiceLineItem|
|--------------|-------------|---------------|
|78278278 |8765 |VV190 |
|78278278 |8765 |VV191 |
|78278278 |9321 |VV198 |
|78278278 |9321 |VV199 |
To:
|CustomerNumber|InvoiceNumber [REPEATED]|InvoiceLineItem [REPEATED]|
|--------------|------------------------|--------------------------|
|78278278 |8765 |VV190 |
| | |VV191 |
| |------------------------|--------------------------|
| |9321 |VV198 |
| | |VV199 |
I am able to create this type of schema in BigQuery but unable to write the SQL Query to go from my denormalised data to my desired table schema.
[
{
"name": "CustNumber",
"type": "STRING"
},
{
"fields": [
{
"name": "InvoiceNumber",
"type": "STRING"
},
{
"mode": "REPEATED",
"name": "InvoiceLineItem",
"type": "STRING"
}
],
"mode": "REPEATED",
"name": "OrderInfo",
"type": "RECORD"
}
]
Screenshot of the schema as seen in the Bigquery Console:
Any help would be greatly appreciated. Thanks :D
Additional ressources - A Sample to get started playing:
WITH
DATA AS (
SELECT
"78278278" AS CustomerNumber,
"8765" AS InvoiceNumber,
"VV190" AS InvoiceLineItem
UNION ALL
SELECT
"78278278" AS CustomerNumber,
"8765" AS InvoiceNumber,
"VV191" AS InvoiceLineItem
UNION ALL
SELECT
"78278278" AS CustomerNumber,
"9321" AS InvoiceNumber,
"VV198" AS InvoiceLineItem
UNION ALL
SELECT
"78278278" AS CustomerNumber,
"9321" AS InvoiceNumber,
"VV199" AS InvoiceLineItem )
CodePudding user response:
Try this
WITH
DATA AS (
SELECT
"78278278" AS CustomerNumber,
"8765" AS InvoiceNumber,
"VV190" AS InvoiceLineItem
UNION ALL
SELECT
"78278278" AS CustomerNumber,
"8765" AS InvoiceNumber,
"VV191" AS InvoiceLineItem
UNION ALL
SELECT
"78278278" AS CustomerNumber,
"9321" AS InvoiceNumber,
"VV198" AS InvoiceLineItem
UNION ALL
SELECT
"78278278" AS CustomerNumber,
"9321" AS InvoiceNumber,
"VV199" AS InvoiceLineItem )
SELECT CustomerNumber, ARRAY_AGG(STRUCT (InvoiceNumber,InvoiceLineItem)) OrderInfo FROM (
SELECT CustomerNumber, InvoiceNumber, ARRAY_AGG(InvoiceLineItem) InvoiceLineItem FROM DATA
GROUP BY CustomerNumber, InvoiceNumber
)
GROUP BY CustomerNumber