Home > other >  A reasonable way to design this dynamoDB schema?
A reasonable way to design this dynamoDB schema?

Time:12-31

I currently have the following data in my DynamoDB table:

person_id_and_gender | ttl(timestamp) | person_movie_rate                                             |
-------------------------------------------------------------------------------------------------------
id_1:male            | 123456789      | amazing_spider_man:0.8, iron_man:0.674, dr_strange:0.32, ...  |
id_9:non-binary      | 123000089      | batman:0.9, iron_man:0.874, terminator:0.55, lala_land:0.5 ...|
...

As you can see, this table is trying to save the relationship between a person and his/her/their ratings to a list of different movies. As the number of new movies increases rapidly, the entry size limit (400k) has been hit and thus we must cut off some ratings in order to fit into one entry of a person.

Current config: person_id_and_gender is the primary key of this table and it does not have a sort key.

Is there a better way to re-design this schema so the we won't explode the entry even if we have more and more ratings?

Please note:

  1. All column names/attributes are made up. They only serve as examples (Maybe bad examples though).

  2. In our use case, we might have more "genders" (male, female, non-binary and more...)

  3. In our use case, we assume one person may have different genders, in other words, we may see id_2:male and id_2:female shows up in the same table, and we need both data points.

CodePudding user response:

You don’t specify your query and update patterns so it’s hard to give a definitive answer.

Guessing at your patterns, my suggestion would be make the movie title the sort key. You could then get_item a person’s rating for a movie or query to fetch all a person’s (gender-tied) ratings. No movie count limit. You can keep the TTL on each item if you’d like.

CodePudding user response:

As others have pointed out, the usual health warning about starting with access patterns applies. With that big caveat in mind, a pattern that scales with the number of ratings would be:

PK SK rating birthday
id_1:male Attributes 2000-01-10
id_1:male Rating#amazing_spider_man 0.8
id_1:male Rating#iron_man 0.674
id_9:non-binary Rating#iron_man 0.874

This uses generic key names (PK and SK) and compound sort key values to model many-to-many relationships in a single table design.

PK = "id_1:male" AND SK = "Attributes" # user attributes
PK = "id_1:male" AND SK > "Rating" # all ratings for a user
PK = "id_1:male" AND SK = "Rating#amazing_spider_man" # user rating for a specific movie

If your use case requires querying by movie, you can add an index where the keys are swapped: GSI1PK is the movie and GSI1SK is the user_id.

Furthermore, if you invert the gender and id in the index's SK, you can query movie rating by gender.

GSI1PK = "iron_man" AND GSI1SK > "" # iron man ratings for all users
GSI1PK = "iron_man" AND begins_with(GSISK, "non-binary") #  iron man ratings for non-binary users
  • Related