Home > Blockchain >  SSIS Lookup multi-columns to one table to retrieve surrogate key
SSIS Lookup multi-columns to one table to retrieve surrogate key

Time:07-30

I have two tables: FACT_STAGE_DATES and DIM_DATES I need to lookup the dates in FACT_STAGE_DATES to retrieve the surrogate key from DIM_DATES in SSIS using the lookup component, as shown in the screenshot below:

enter image description here

The problem is: there are 21 dates in the FACT_STAGE_DATES, I was about the create 21 lookup components to achieve this, is this the best solution or is there a better way even if using SQL server.

EDIT: I am trying to implement the lookup using SQL joins, but I got a million of rows "duplicated" instead of a few thousand, this is my query:

select dat1.DATE_SRGT, dat2.DATE_SRGT, dat3.DATE_SRGT, ... , 
dat21.DATE_SRGT


FROM SAC_STAGING.DBO.STG_ACCR_FACT_REQUEST_ACCU_SNAPSHOT SHOT

left join SAC_DWH.dbo.DIM_DATE dat1
on SHOT.Stage1_End_Date = dat1.FULL_DATE

left join SAC_DWH.dbo.DIM_DATE dat2
on SHOT.Stage1_End_Date = dat1.FULL_DATE

left join SAC_DWH.dbo.DIM_DATE dat3
on SHOT.Stage1_End_Date = dat1.FULL_DATE
.
.
.
left join SAC_DWH.dbo.DIM_DATE dat21
on SHOT.Stage1_End_Date = dat1.FULL_DATE

What's wrong with this query?

CodePudding user response:

Without a design change there's no better way. In T-SQL it would just be 21 joins, so not a great deal better.

</answer>
<commentary>

But do you actually need a surrogate key for your calendar dimension? If the datatype of full_date is date, then that's only 3 bytes (vs 4 for a typical int surrogate), and it's already naturally sorted, stable, necessarily conformed across all possible sources (it will never be the case that '2022-07-30' means two different dates in two different applications*), and it is never implemented as a slowly changing dimension. Unknown and missing values can still be represented by picking date values that you know will never be in the domain (9999-12-31 = unknown, 9999-12-30 = missing, etc). Even Kimball exempts date dimensions from the surrogate key rule.

If you let the date value itself be the key, you don't need to do the lookup during loading at all.

The lookup is needed at the moment because fact data comes in from some source application with actual dates, but to populate the star schema you need to populate the fact table with the value of the surrogate. For example, suppose a source application provides fact data with a date value of 2022-07-30. In order to populate your data warehouse fact table you need to know the value of the date key in the calendar dimension associated with this date. So you have to go look it up in order to populate the fact table. Oh, it's the integer value 4213, cool, write that into the date_key column on the fact table.

But if the calendar dimension key is just the date, then you don't need to look up the key values when populating the fact table. The application is already sending a date, and the calendar dimension key is also a date. There's nothing to look up.

* unless you are working across time zones, but then that complication applies to both implementations anyway, and you would be using datetimeoffset somewhere

  • Related