I need to group and sort my primary key (user_id) together by chunking them together. So, I need to introduce a new column that sort of act like a counter. Essentially, I need to make something like this:
chunk_user | user | item |
1 | 200401 | 78832 |
1 | 200401 | 95718 |
1 | 200401 | 24161 |
2 | 200402 | 12437 |
2 | 200402 | 61490 |
2 | 200402 | 45956 |
from something like this:
user id | isbn | rating
123 | 4567 | 2
129 | 7890 | 3
127 | 4450 | 0
123 | 9972 | 1
I tried using the group_by() to start but not only it doesn't work like the way I wanted, it doesn't even sort my data properly.
table_rated <- tbl_ratings %>% group_by(user_id) %>% arrange(isbn, ratings, sort = TRUE)
I really appreciate any help!
CodePudding user response:
It's not entirely clear, but this may be close to what you're looking for.
You can use rleid
from data.table
to enumerate different user_id
s.
When using arrange
you can include .by_group
to sort first by the grouping variable.
Let me know if this is what you had in mind.
library(tidyverse)
library(data.table)
tbl_ratings %>%
group_by(user_id) %>%
arrange(isbn, rating, .by_group = T) %>%
ungroup %>%
mutate(chunk_user = rleid(user_id))
Output
user_id isbn rating chunk_user
<dbl> <dbl> <int> <int>
1 123 4567 2 1
2 123 9972 1 1
3 127 4450 0 2
4 129 7890 3 3