Cassandra table data modeling-CodePudding

I just started to learn about Cassandra and I have a pretty specific question. After looking at some videos and tutorials in my understanding there is 1 primary that includes partition keys and clustering keys.

So my question is: If I have some data that looks like this:

Unit: (unit_id, unit_name, unit_description)

and I want to save units working hours that look like:

Working hours: (unit_id, day, start_time, end_time)

What would you suggest the data modeling should look like? In addition to that, if I have to get all units that work on Sunday or all units that start at 10:00, what would you suggest to do?

CodePudding user response：

You're actually doing it backwards by starting with how you want to store the data. It's a common mistake for those coming from a traditional relational background.

In Cassandra data modelling, we always start with the application query. For each app query, we would design a CQL table for it.

If your app query is "get units where day is X", we would design a table that looks like this:

CREATE TABLE units_by_day (
    day text,
    unit_id text,
    ...
    PRIMARY KEY (day, unit_id)
)

For each day, there are one or more rows of units (clustered by unit_id). To retrieve the units that worked on Sunday:

SELECT unit_id FROM units_by_day WHERE day = 'Sunday'

If your app query is "get units that worked on day X and started at Y":

CREATE TABLE units_by_day_starttime (
    day text,
    start_time time,
    unit_id text,
    ...
    PRIMARY KEY (day, start_time, unit_id)
)

And the CQL query would look like:

SELECT unit_id FROM units_by_day_starttime
  WHERE day = ?
  AND start_time = ?

Note that you can also do range queries on start_time like:

SELECT unit_id FROM units_by_day_starttime
  WHERE day = ?
  AND start_time >= ?
  AND start_time < ?

Cheers!