Home > Software engineering >  Pivoting columns from STRUCT type (complex schema)
Pivoting columns from STRUCT type (complex schema)

Time:12-11

I am working with a table called "Campaigns". It has a bit of a complex schema:

Field name Type Mode
internal_customer_id STRING NULLABLE
type STRING NULLABLE
ingest_timestamp TIMESTAMP NULLABLE
timestamp TIMESTAMP NULLABLE
properties RECORD NULLABLE
properties.campaign_name STRING NULLABLE
properties. variant_id NUMERIC NULLABLE
properties. action_id NUMERIC NULLABLE
properties. action_type STRING NULLABLE
properties. variant_type STRING NULLABLE
properties. new_choice BOOLEAN NULLABLE
properties. variant STRING NULLABLE
properties. campaign_id STRING NULLABLE
properties. integration_id STRING NULLABLE
properties. subject STRING NULLABLE
properties. message STRING NULLABLE
properties. type STRING NULLABLE
properties. integration_name STRING NULLABLE
properties. campaign_policy STRING NULLABLE
properties. action_name STRING NULLABLE
properties. consent_category STRING NULLABLE
properties. status STRING NULLABLE
properties. language STRING NULLABLE
properties. message_id STRING NULLABLE
properties. recipient STRING NULLABLE
properties. sent_timestamp NUMERIC NULLABLE
properties. sending_ip STRING NULLABLE
properties. country STRING NULLABLE
properties. city STRING NULLABLE
properties. ip STRING NULLABLE
properties. latitude NUMERIC NULLABLE
properties. state STRING NULLABLE
properties. user_agent__a510e506 STRING NULLABLE
properties. longitude NUMERIC NULLABLE
properties. code NUMERIC NULLABLE
properties. cumulative BOOLEAN NULLABLE
properties. url STRING NULLABLE
properties. xpath STRING NULLABLE
properties. valid_until NUMERIC NULLABLE
properties. attempts NUMERIC NULLABLE
properties. status_code NUMERIC NULLABLE
properties. audience_id STRING NULLABLE
properties. integration_type STRING NULLABLE
properties. operation STRING NULLABLE
raw_properties RECORD NULLABLE
raw_properties. campaign_name STRING NULLABLE
raw_properties. variant_id STRING NULLABLE
raw_properties. action_id STRING NULLABLE
raw_properties. action_type STRING NULLABLE
raw_properties. variant_type STRING NULLABLE
raw_properties. new_choice STRING NULLABLE
raw_properties. variant STRING NULLABLE
raw_properties. campaign_id STRING NULLABLE
raw_properties. integration_id STRING NULLABLE
raw_properties. subject STRING NULLABLE
raw_properties. message STRING NULLABLE
raw_properties. type STRING NULLABLE
raw_properties. integration_name STRING NULLABLE
raw_properties. campaign_policy STRING NULLABLE
raw_properties. action_name STRING NULLABLE
raw_properties. consent_category STRING NULLABLE
raw_properties. status STRING NULLABLE
raw_properties. language STRING NULLABLE
raw_properties. message_id STRING NULLABLE
raw_properties. recipient STRING NULLABLE
raw_properties. sent_timestamp STRING NULLABLE
raw_properties. sending_ip STRING NULLABLE
raw_properties. country STRING NULLABLE
raw_properties. city STRING NULLABLE
raw_properties. ip STRING NULLABLE
raw_properties. latitude STRING NULLABLE
raw_properties. state STRING NULLABLE
raw_properties. user_agent__a510e506 STRING NULLABLE
raw_properties. longitude STRING NULLABLE
raw_properties. code STRING NULLABLE
raw_properties. cumulative STRING NULLABLE
raw_properties. url STRING NULLABLE
raw_properties. xpath STRING NULLABLE
raw_properties. valid_until STRING NULLABLE
raw_properties. attempts STRING NULLABLE
raw_properties. status_code STRING NULLABLE
raw_properties. audience_id STRING NULLABLE
raw_properties. integration_type STRING NULLABLE
raw_properties. operation STRING NULLABLE

The field "properties.status" can have different values - "opened", "clicked" and a few other values that are not relevant at the moment. Bringing context to my question - this is a table that stores information from user's activities on the emails that we send to them. Every time a user opens and clicks on an email, a row is created on the table. So if the user opens the same email 3 times, we will see 3 lines with the same information "properties.status = "opened".

I want to use just a few fields from this column, and most importantly I want to see how many USERS have opened and clicked each of our emails. Bring this to SQL, I would like to count the distinct "internal_customer_id" for when the status was "opened" and "clicked", grouped by Campaign Name. So what I'd like to see as an output is:

Campaign_Name Subject Opened Clicked

I did try to Pivot, as you can see on this question but the output is that for every opened and for every click, I have one line in the table where Opened = 1 and Clicked = 1.

CodePudding user response:

I think, you are looking for below

select * from (
  select UserID, 
    properties.CampaignName, 
    properties.Subject, 
    properties.Status 
  from your_table
)
pivot (count(distinct UserID) for Status in ('opened', 'clicked'))  
  • Related