dbt merge with multiple unique_keys how to reference the unique key with jinja?-CodePudding

as per dbt docs it states:

In cases where you need multiple columns in combination to uniquely identify each row, we recommend you pass these columns as a list (unique_key = ['user_id', 'session_number']), rather than a string expression (unique_key = 'concat(user_id, session_number)').

so I've created a small incremental model where I need to use multiple unique keys.

{{ config(materialized='table',
   incremental_strategy='merge', 
   unique_key=['col1', 'col2', 'col3']
) 
}}

WITH using_clause AS (
SELECT ...
FROM tbl
{% if is_incremental() %}
  WHERE ingested_dt > (SELECT MAX(ingested_dt ) FROM {{ this }})
{% endif %}
)

, updates AS (

SELECT ...
FROM using_clause
 {% if is_incremental() %}

        WHERE my_unique_keys IN (SELECT unique_keys FROM {{ this }})

  {% endif %}

) 

, inserts AS (
  SELECT ..
  FROM using_clause
  WHERE unique_keys NOT IN (select unique_keys from updates)
  ) 

SELECT * FROM updates UNION inserts

whats the correct way to select the multiple columns? I assume something along the lines of {ref(unique_keys)} ? but I couldn't find anything in the docs.

my current method just hashes the columns (md5) but i wonder if there is a native method to do this given that the unique_key param accepts multiple columns.

CodePudding user response：

The config object is available to you in the DBT Jinja context. See https://docs.getdbt.com/reference/dbt-jinja-functions/config

If you want to use the unique_key inside your model, you can do something like

SELECT {{ config.get("unique_key")|join(',') }}, ...
FROM {{ this }}

After compilation, it'll look like this

SELECT col1,col2,col3, ...
FROM my_model