Home > Mobile >  How to chunk multiple user_id together to form another column in a data.table?
How to chunk multiple user_id together to form another column in a data.table?

Time:02-27

I need to group and sort my primary key (user_id) together by chunking them together. So, I need to introduce a new column that sort of act like a counter. Essentially, I need to make something like this:

chunk_user |   user    |     item     |
1          |  200401   |    78832     |
1          |  200401   |    95718     |
1          |  200401   |    24161     |
2          |  200402   |    12437     |
2          |  200402   |    61490     |
2          |  200402   |    45956     |

from something like this:

user id | isbn | rating
    123 | 4567 | 2
    129 | 7890 | 3
    127 | 4450 | 0
    123 | 9972 | 1

I tried using the group_by() to start but not only it doesn't work like the way I wanted, it doesn't even sort my data properly.

table_rated <- tbl_ratings %>% group_by(user_id) %>% arrange(isbn, ratings, sort = TRUE)

I really appreciate any help!

CodePudding user response:

It's not entirely clear, but this may be close to what you're looking for.

You can use rleid from data.table to enumerate different user_ids.

When using arrange you can include .by_group to sort first by the grouping variable.

Let me know if this is what you had in mind.

library(tidyverse)
library(data.table)

tbl_ratings %>%
  group_by(user_id) %>%
  arrange(isbn, rating, .by_group = T) %>%
  ungroup %>%
  mutate(chunk_user = rleid(user_id))

Output

  user_id  isbn rating chunk_user
    <dbl> <dbl>  <int>      <int>
1     123  4567      2          1
2     123  9972      1          1
3     127  4450      0          2
4     129  7890      3          3
  • Related