as per dbt
docs it states:
In cases where you need multiple columns in combination to uniquely identify each row, we recommend you pass these columns as a list (unique_key = ['user_id', 'session_number']), rather than a string expression (unique_key = 'concat(user_id, session_number)').
so I've created a small incremental model where I need to use multiple unique keys.
{{ config(materialized='table',
incremental_strategy='merge',
unique_key=['col1', 'col2', 'col3']
)
}}
WITH using_clause AS (
SELECT ...
FROM tbl
{% if is_incremental() %}
WHERE ingested_dt > (SELECT MAX(ingested_dt ) FROM {{ this }})
{% endif %}
)
, updates AS (
SELECT ...
FROM using_clause
{% if is_incremental() %}
WHERE my_unique_keys IN (SELECT unique_keys FROM {{ this }})
{% endif %}
)
, inserts AS (
SELECT ..
FROM using_clause
WHERE unique_keys NOT IN (select unique_keys from updates)
)
SELECT * FROM updates UNION inserts
whats the correct way to select the multiple columns? I assume something along the lines of {ref(unique_keys)}
? but I couldn't find anything in the docs.
my current method just hashes the columns (md5) but i wonder if there is a native method to do this given that the unique_key
param accepts multiple columns.
CodePudding user response:
The config
object is available to you in the DBT Jinja context. See https://docs.getdbt.com/reference/dbt-jinja-functions/config
If you want to use the unique_key
inside your model, you can do something like
SELECT {{ config.get("unique_key")|join(',') }}, ...
FROM {{ this }}
After compilation, it'll look like this
SELECT col1,col2,col3, ...
FROM my_model