I've just started learning Data Modeling in DW. And now I'm a bit confused about choosing with field to build a Dimension Table and whether should I split into many DIM tables For example, I have a Employee Table and it's columns as below:
EmployeeID
FirstName
LastName
State
Age
Department
Position
Skin Color
Hair Color
Sale Amount (Fact)
Emp_DIM table content every column above except Sale Amount
and a Fact table that contains
EmployeeID (Surrogate Key)
Sale Amount (Fact)
Ok, put the Date DIM aside. I want to ask about the Emp_DIM table
- Should I add Department_DIM(ID, Name, HeadPerson), State_DIM (ID, Name, StateCode), Position_DIM (ID, Name, Description), and Color_DIM (ID, Name) to represent Department, State, Position, Color? Then the Fact_Table will also have DepartmentID, StateID,... in it. Because sometimes I saw Department_DIM, Position_DIM and State_DIM table on the internet
- How can I know which attribute should be a separate DIM_table? Because in the above example, I can still query everything like Total Sale Amount per Department, per State, per Position,... with SQL and join with 1 DIM table only.
According to what I read on the internet, the DIM table should have as many fields as possible, and contain descriptive information. But I'm still confused and not sure.
CodePudding user response:
Should I add Department_DIM
If a Department has other attributes you need to report on, then you should consider splitting it out. This "normalization of dimensions" is also called "snowflaking", or moving from a star schema to a snowflake schema. But it's totally optional and people do it both ways.
Then the Fact_Table will also have DepartmentID
This a separate decision. This is denormalizing by putting non-key dimension attributes directly on your fact table, and it's a performance optimization. You might do this for a few key attributes.