I am trying to compute the pairwise similarity between accounts using similar hashtags over time.
I have code (below) that gives me the pairwise similarity between accounts for the most recent 300 tweets sent by each account. However, I would like to compute the pairwise similarity between accounts for specific slices of time (day, week, month). How can I do that?
library(rtweet)
library(widyr)
library(tidyverse)
rstats <- search_users("rstats", n = 10)
rstats_tmls <- get_timeline(rstats$user_id, n = 300)
rstats_tmls %>%
unnest(hashtags) %>%
count(user_id, hashtags) %>%
pairwise_similarity(user_id, hashtags, n, sort = T, upper = FALSE)
# A tibble: 45 x 3
item1 item2 similarity
<chr> <chr> <dbl>
1 2170413740 792007388358410240 1.00
2 2170413740 961691888939126784 1.00
3 792007388358410240 961691888939126784 1.00
4 1153678152838852614 2170413740 1.00
5 1153678152838852614 792007388358410240 1.00
6 1153678152838852614 961691888939126784 1.00
7 2170413740 824037040996098049 0.998
8 792007388358410240 824037040996098049 0.998
9 824037040996098049 961691888939126784 0.998
10 1153678152838852614 824037040996098049 0.998
CodePudding user response:
Using group_by()
should work:
rstats_tmls %>%
mutate(year = lubridate::year(created_at),
week = lubridate::week(created_at)) %>%
unnest(hashtags) %>%
group_by(year, week) %>%
count(user_id, hashtags) %>%
pairwise_similarity(user_id, hashtags, n, sort = T, upper = FALSE)
# # A tibble: 204 × 5
# # Groups: year, week [112]
# year week item1 item2 similarity
# <dbl> <dbl> <chr> <chr> <dbl>
# 1 2014 3 2170413740 559211484 0.5
# 2 2014 11 2170413740 559211484 0.707
# 3 2017 28 2170413740 824037040996098049 1
# 4 2017 29 2170413740 824037040996098049 0.986
# 5 2017 30 2170413740 824037040996098049 1
# 6 2017 32 2170413740 824037040996098049 0.949
# 7 2017 33 2170413740 824037040996098049 0.962
# 8 2017 34 2170413740 824037040996098049 0.981
# 9 2017 36 2170413740 824037040996098049 0.707
# 10 2017 37 2170413740 824037040996098049 0.943