(Appreciate this post is perhaps too high level or philosophical for SO, I'm in the schema planning phase and seeking some guidance)
After some difficulty working with a clone of our production database for analytics, I am attempting to define a events fact table along with some dimensions tables in order to make analytics work simpler.
The block I've hit in my planning is this. We have different categories of event with different dimensions needed to describe them. E.g. suppose we have 'account settings' event category as well as 'Gallery' events.
In a fact table I might have a field eventCategory and eventName with example values from above such as:
'EventCategory': 'Account Settings'
'EventName': 'Update Card Billing Details'
Or:
'EventCategory': 'Gallery'
'EventName': 'Create New Gallery'
In each case I want to use a different collection of dimensions to describe them. E.g. for Gallery events we want to know 'template', 'count of images', 'gallery category e.g. fruits'. We have no need for these details with account settings events, which have their own distinct set of dimensions to describe them.
Via the textbook examples I find online, I would have a dimensions table for Gallery events and a dimensions table for Account Settings events.
The mental block I have is that these dimensions are dynamic not static. I want to record in the fact table the value of these dimensions at the time of the event not 'now'. For example, a user can either be in trial or a paid user. If I had a dimension table 'user' their status might currently be 'paid' but at the time of some previous gallery event they may have been in trial.
What is the 'right' way to handle this:
- Multiple facts tables, one for Gallery events and one for Account Settings events?
- Use json in a new field in the main facts table e.g. 'EventDetail' which contains what would otherwise go in a dimension table except by using json we know the values of the dimensions at the time of the event as opposed to whatever those values are now?
- I could have a sparse facts table. I would include fields for each dimension across all categories and these would be null where not applicable
Given that the dimensions I use to describe an event are dynamic, what is the 'right' way to construct a fact table for analytics? The way I see it just now the dimensions tables would have to be facts themselves to capture the changing values of these attributes over time.
CodePudding user response:
To add a dimension to any SQL table is always done the same way, by adding a column.
In any kind of history, there is no "now". Every status has a time period: a beginning and ending. I usually name those columns AsOf
and Until
, because begin/end show up a lot as SQL keywords, making the column names harder to scan for. Usually, only AsOf
is needed, because you can self-join the table to find succeeding periods, and use NULL to represent 'now' (where "now" means, as of the time the query is executed).
'user' their status might currently be 'paid' but at the time of some previous gallery event they may have been in trial.
Right, so the user's status isn't just paid/trial. It's paid or trial starting AsOf some date, until a later AsOf date for the same user.
It's hard to be more helpful. There's a bit of jargon in your question, and it's couched in domain-specific terms. I hope by attaching a date/time to every status, you can see your way out of the forest.