I just started to learn about Cassandra and I have a pretty specific question. After looking at some videos and tutorials in my understanding there is 1 primary that includes partition keys and clustering keys.
So my question is: If I have some data that looks like this:
Unit: (unit_id, unit_name, unit_description)
and I want to save units working hours that look like:
Working hours: (unit_id, day, start_time, end_time)
What would you suggest the data modeling should look like? In addition to that, if I have to get all units that work on Sunday or all units that start at 10:00, what would you suggest to do?
CodePudding user response:
You're actually doing it backwards by starting with how you want to store the data. It's a common mistake for those coming from a traditional relational background.
In Cassandra data modelling, we always start with the application query. For each app query, we would design a CQL table for it.
If your app query is "get units where day is X", we would design a table that looks like this:
CREATE TABLE units_by_day (
day text,
unit_id text,
...
PRIMARY KEY (day, unit_id)
)
For each day, there are one or more rows of units (clustered by unit_id
). To retrieve the units that worked on Sunday:
SELECT unit_id FROM units_by_day WHERE day = 'Sunday'
If your app query is "get units that worked on day X and started at Y":
CREATE TABLE units_by_day_starttime (
day text,
start_time time,
unit_id text,
...
PRIMARY KEY (day, start_time, unit_id)
)
And the CQL query would look like:
SELECT unit_id FROM units_by_day_starttime
WHERE day = ?
AND start_time = ?
Note that you can also do range queries on start_time
like:
SELECT unit_id FROM units_by_day_starttime
WHERE day = ?
AND start_time >= ?
AND start_time < ?
Cheers!