I am working on a social networking project with a post system.
Here is my current post table
CREATE TABLE afterstr.posts (
user_id text,
date timeuuid,
content text,
PRIMARY KEY (user_id, date)
) WITH CLUSTERING ORDER BY (date DESC)
My problem is that users can make multiple posts.
In the context of loading a feed by date of publication, I would like to retrieve posts from several people and sort them by date with a limit of 3.
Here is the select command:
SELECT * FROM posts where user_id in ('234235440', '992777') limit 3;
but cassandra does not send me the last 3 posts of all users but only the last 2 of the first id in the "IN" without sorting by date.
Cassandra responce (the numbers 1, 2 and 3 are the contents and represent the order of sending):
234235440 | a5cf4950-d46a-11ec-a4e5-19251c2b96e1 | 3
234235440 | a5cf4950-d46a-11ec-a4e5-190382029828 | 1
992777 | a5cf4950-d46a-11ec-a4e5-202093838892 | 2
Here is the expected answer
234235440 | a5cf4950-d46a-11ec-a4e5-19251c2b96e1 | 3
992777 | a5cf4950-d46a-11ec-a4e5-202093838892 | 2
234235440 | a5cf4950-d46a-11ec-a4e5-190382029828 | 1
Do you have any idea how to solve the problem by avoiding ORDER BY which is not recommended for this type of request ? Do you think that the use of a search engine could be an optimised solution?
CodePudding user response:
Cassandra guarantees order only inside the single partition, but not between partitions. When you specify multiple values for partition key, then it will take data from first partition, sorted only inside it, then from second partition, again sorted only inside it, etc. If you need to have a global order, you need to sort yourself.
CodePudding user response:
The column content
is not even part of the clustering columns in first place to get the results sorted. Rest of the working is as explained by Alex earlier.
token@cqlsh:astra1> select user_id,toTimestamp(date),content from astra1.posts;
user_id | system.totimestamp(date) | content
----------- --------------------------------- ---------
234235440 | 2022-05-15 17:41:54.283000 0000 | 3
234235440 | 2022-05-15 17:41:25.721000 0000 | 1
992777 | 2022-05-15 17:41:35.414000 0000 | 2
(3 rows)
In the above result, as you can see within the given partition key column of user_id
, the result of clustering column date
is already sorted by default how we wanted it to sort (i.e. date DESC
).
If I had wanted the results by ASC
sort order, then I could do something like below:
token@cqlsh:astra1> select user_id,toTimestamp(date),content from astra1.posts where user_id='234235440' order by date asc;
user_id | system.totimestamp(date) | content
----------- --------------------------------- ---------
234235440 | 2022-05-15 17:41:25.721000 0000 | 1
234235440 | 2022-05-15 17:41:54.283000 0000 | 3
(2 rows)
Let's say if I've the below data in my table for user_id = 234235440
,
token@cqlsh:astra1> select user_id,toTimestamp(date),content from astra1.posts where user_id='234235440' order by date asc;
user_id | system.totimestamp(date) | content
----------- --------------------------------- ---------
234235440 | 2022-05-15 17:51:50.351000 0000 | 1
234235440 | 2022-05-15 17:51:53.130000 0000 | 2
234235440 | 2022-05-15 17:51:55.802000 0000 | 3
234235440 | 2022-05-15 17:51:58.261000 0000 | 4
234235440 | 2022-05-15 17:52:00.751000 0000 | 5
234235440 | 2022-05-15 17:52:03.717000 0000 | 6
(6 rows)
and if I want only the top 3 sorted by the clustering order which I've already defined while creating the table, I could leverage PER PARTITION LIMIT
to fetch only the top 3 for that given partition column,
token@cqlsh:astra1> select user_id,toTimestamp(date),content from astra1.posts where user_id='234235440' per partition limit 3;
user_id | system.totimestamp(date) | content
----------- --------------------------------- ---------
234235440 | 2022-05-15 17:52:03.717000 0000 | 6
234235440 | 2022-05-15 17:52:00.751000 0000 | 5
234235440 | 2022-05-15 17:51:58.261000 0000 | 4
(3 rows)
remember that I could also reverse the clustering column sorting order and fetch the top 3 in the reverse direction like this:
token@cqlsh:astra1> select user_id,toTimestamp(date),content from astra1.posts where user_id='234235440' order by date asc per partition limit 3;
user_id | system.totimestamp(date) | content
----------- --------------------------------- ---------
234235440 | 2022-05-15 17:51:50.351000 0000 | 1
234235440 | 2022-05-15 17:51:53.130000 0000 | 2
234235440 | 2022-05-15 17:51:55.802000 0000 | 3
(3 rows)
For example, if you want only up to the top 3 rows per partition across that table, your query would look like below:
token@cqlsh:astra1> select * from astra1.posts per partition limit 3;
user_id | date | content
----------- -------------------------------------- ---------
234235440 | ba85cb50-d477-11ec-b834-4dbc24fdf387 | 6
234235440 | b8c137f0-d477-11ec-b834-4dbc24fdf387 | 5
234235440 | b7454650-d477-11ec-b834-4dbc24fdf387 | 4
992777 | 44065d60-d476-11ec-b834-4dbc24fdf387 | 2
(4 rows)
Hope that helps!