Home > Net >  Constructing a User Journey in R
Constructing a User Journey in R

Time:06-14

Session/User ID Time Stamp Page
101 dd - mm - yy 01:00:05 Page A
101 dd - mm - yy 01:00:10 Page B
101 dd - mm - yy 01:00:35 Page C
102 dd - mm - yy 02:00:10 Page B
102 dd - mm - yy 02:00:20 Page C
103 dd - mm - yy 02:00:35 Page A
104 dd - mm - yy 03:00:40 Page B
104 dd - mm - yy 03:00:45 Page C

I have a question similar to one asked here: Constructing User Journey - How do you 'self, loop' join?. I want to create a path grouped by session ID and arranged by timestamp. And, I would also like to count how many sessions/users went through the same path.

I would like an outcome like this:

How many users followed the same path:

Path Frequency
Page A - Page B - Page C 1
Page B - Page C 2
Page A 1

An idea of which user followed what path:

Session/User ID Path
101 Page A - Page B - Page C
102 Page B - Page C
103 Page A
104 Page B - Page C

I would really appreciate some help. Thank you.

CodePudding user response:

You may try

df <- read.table(text = "Session_UserID TimeStamp   Page
101 'dd - mm - yy 01:00:05' 'Page A'
101 'dd - mm - yy 01:00:10' 'Page B'
101 'dd - mm - yy 01:00:35' 'Page C'
102 'dd - mm - yy 02:00:10' 'Page B'
102 'dd - mm - yy 02:00:20' 'Page C'
103 'dd - mm - yy 02:00:35' 'Page A'
104 'dd - mm - yy 03:00:40' 'Page B'
104 'dd - mm - yy 03:00:45' 'Page C'", header = T)

librar(dplyr)

df %>%
  group_by(Session_UserID) %>%
  summarize(path = paste(c(Page), collapse = "-"))

  Session_UserID path                
           <int> <chr>               
1            101 Page A-Page B-Page C
2            102 Page B-Page C       
3            103 Page A              
4            104 Page B-Page C 

df %>%
  group_by(Session_UserID) %>%
  summarize(path = paste(c(Page), collapse = "-")) %>%
  group_by(path) %>%
  summarize(Frequency = n())

  path                 Frequency
  <chr>                    <int>
1 Page A                       1
2 Page A-Page B-Page C         1
3 Page B-Page C                2

CodePudding user response:

I also arranged by time. Maybe the data is not in the right order in the first place:

library(tidyverse)

data <- tibble::tribble(
  ~id, ~time, ~page,
  101L, "2022-06-14 01:00:05", "Page A",
  101L, "2022-06-14 01:00:10", "Page B",
  101L, "2022-06-14 01:00:35", "Page C",
  102L, "2022-06-14 02:00:10", "Page B",
  102L, "2022-06-14 02:00:20", "Page C",
  103L, "2022-06-14 02:00:35", "Page A",
  104L, "2022-06-14 03:00:40", "Page B",
  104L, "2022-06-14 03:00:45", "Page C"
)

data %>%
  type_convert() %>%
  group_by(id) %>%
  arrange(time) %>%
  summarise(
    path = page %>% paste0(collapse = "-")
  ) %>%
  count(path)
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   time = col_datetime(format = ""),
#>   page = col_character()
#> )
#> # A tibble: 3 × 2
#>   path                     n
#>   <chr>                <int>
#> 1 Page A                   1
#> 2 Page A-Page B-Page C     1
#> 3 Page B-Page C            2

Created on 2022-06-14 by the reprex package (v2.0.0)

  •  Tags:  
  • r
  • Related