Home > Software design >  Bigquery - Repeated field inside of a repeated record
Bigquery - Repeated field inside of a repeated record

Time:08-12

I have a denormalised table and I want to simplify it using repeated fields inside of BigQuery. To illustrate what I am trying to do, I want to go from:

|CustomerNumber|InvoiceNumber|InvoiceLineItem|
|--------------|-------------|---------------|
|78278278      |8765         |VV190          |
|78278278      |8765         |VV191          |
|78278278      |9321         |VV198          |
|78278278      |9321         |VV199          |

To:

|CustomerNumber|InvoiceNumber [REPEATED]|InvoiceLineItem [REPEATED]|
|--------------|------------------------|--------------------------|
|78278278      |8765                    |VV190                     |
|              |                        |VV191                     |
|              |------------------------|--------------------------|
|              |9321                    |VV198                     |
|              |                        |VV199                     |

I am able to create this type of schema in BigQuery but unable to write the SQL Query to go from my denormalised data to my desired table schema.

[
  {
    "name": "CustNumber",
    "type": "STRING"
  },
  {
    "fields": [
      {
        "name": "InvoiceNumber",
        "type": "STRING"
      },
      {
        "mode": "REPEATED",
        "name": "InvoiceLineItem",
        "type": "STRING"
      }
    ],
    "mode": "REPEATED",
    "name": "OrderInfo",
    "type": "RECORD"
  }
]

Screenshot of the schema as seen in the Bigquery Console:

Any help would be greatly appreciated. Thanks :D

Additional ressources - A Sample to get started playing:

WITH
  DATA AS (
  SELECT
    "78278278" AS CustomerNumber,
    "8765" AS InvoiceNumber,
    "VV190" AS InvoiceLineItem
  UNION ALL
  SELECT
    "78278278" AS CustomerNumber,
    "8765" AS InvoiceNumber,
    "VV191" AS InvoiceLineItem
  UNION ALL
  SELECT
    "78278278" AS CustomerNumber,
    "9321" AS InvoiceNumber,
    "VV198" AS InvoiceLineItem
  UNION ALL
  SELECT
    "78278278" AS CustomerNumber,
    "9321" AS InvoiceNumber,
    "VV199" AS InvoiceLineItem )

CodePudding user response:

Try this

WITH
  DATA AS (
  SELECT
    "78278278" AS CustomerNumber,
    "8765" AS InvoiceNumber,
    "VV190" AS InvoiceLineItem
  UNION ALL
  SELECT
    "78278278" AS CustomerNumber,
    "8765" AS InvoiceNumber,
    "VV191" AS InvoiceLineItem
  UNION ALL
  SELECT
    "78278278" AS CustomerNumber,
    "9321" AS InvoiceNumber,
    "VV198" AS InvoiceLineItem
  UNION ALL
  SELECT
    "78278278" AS CustomerNumber,
    "9321" AS InvoiceNumber,
    "VV199" AS InvoiceLineItem )

    
SELECT CustomerNumber, ARRAY_AGG(STRUCT (InvoiceNumber,InvoiceLineItem)) OrderInfo FROM (
    SELECT CustomerNumber, InvoiceNumber, ARRAY_AGG(InvoiceLineItem) InvoiceLineItem FROM DATA
    GROUP BY CustomerNumber, InvoiceNumber    
)
GROUP BY CustomerNumber
  • Related