Home > database >  Compute pairwise similarity over time
Compute pairwise similarity over time

Time:11-09

I am trying to compute the pairwise similarity between accounts using similar hashtags over time.

I have code (below) that gives me the pairwise similarity between accounts for the most recent 300 tweets sent by each account. However, I would like to compute the pairwise similarity between accounts for specific slices of time (day, week, month). How can I do that?

library(rtweet)
library(widyr)
library(tidyverse)

rstats <- search_users("rstats", n = 10)
 
rstats_tmls <- get_timeline(rstats$user_id, n = 300)

rstats_tmls %>%
   unnest(hashtags) %>%
   count(user_id, hashtags) %>%
   pairwise_similarity(user_id, hashtags, n, sort = T, upper = FALSE)


# A tibble: 45 x 3
   item1               item2              similarity
   <chr>               <chr>                   <dbl>
 1 2170413740          792007388358410240      1.00 
 2 2170413740          961691888939126784      1.00 
 3 792007388358410240  961691888939126784      1.00 
 4 1153678152838852614 2170413740              1.00 
 5 1153678152838852614 792007388358410240      1.00 
 6 1153678152838852614 961691888939126784      1.00 
 7 2170413740          824037040996098049      0.998
 8 792007388358410240  824037040996098049      0.998
 9 824037040996098049  961691888939126784      0.998
10 1153678152838852614 824037040996098049      0.998

CodePudding user response:

Using group_by() should work:

rstats_tmls %>%
  mutate(year = lubridate::year(created_at), 
         week = lubridate::week(created_at)) %>% 
  unnest(hashtags) %>%
  group_by(year, week) %>% 
  count(user_id, hashtags) %>%
  pairwise_similarity(user_id, hashtags, n, sort = T, upper = FALSE)


# # A tibble: 204 × 5
# # Groups:   year, week [112]
#    year  week item1      item2              similarity
#   <dbl> <dbl> <chr>      <chr>                   <dbl>
# 1  2014     3 2170413740 559211484               0.5  
# 2  2014    11 2170413740 559211484               0.707
# 3  2017    28 2170413740 824037040996098049      1    
# 4  2017    29 2170413740 824037040996098049      0.986
# 5  2017    30 2170413740 824037040996098049      1    
# 6  2017    32 2170413740 824037040996098049      0.949
# 7  2017    33 2170413740 824037040996098049      0.962
# 8  2017    34 2170413740 824037040996098049      0.981
# 9  2017    36 2170413740 824037040996098049      0.707
# 10  2017    37 2170413740 824037040996098049      0.943

  • Related