Home > front end >  Creating a Sequence with a particular length in R
Creating a Sequence with a particular length in R

Time:11-09

I am trying to create a path sequence. The following is a sample dataset:

df <- structure(list(
  sess_id = c(4, 4, 4, 4, 4, 4, 4, 7, 7, 7, 7, 7), 
  Page = c("A", "B", "C", "D", "A", "C", "B", "B", "C", "D", "A", "D")),
  .Names = c("sess_id", "Page"),
  row.names = c(NA, -12L),
  class = "data.frame")

This is the table:

sess_id Page
4 A
4 B
4 C
4 D
4 A
4 C
4 B
7 B
7 C
7 D
7 A
7 D

I would like to add three columns like so:

sess_id Page Path Start End
4 A
4 B AB A B
4 C ABC A C
4 D ABCD A D
4 A ABCDA A A
4 C BCDAC B C
4 B CDACB C B
7 B
7 C BC B C
7 D BCD B D
7 A BCDA B A
7 D BCDAD B D

I am trying to create a path sequence of five pages in each session. And map the start and end of that five-page sequence.

CodePudding user response:

Use rollapplyr from package zoo to create a rolling sequence per group of sess_id. Then the 1st and the last characters of the sequences are the Start and End columns, respectively.

df <- structure(list(
  sess_id = c(4, 4, 4, 4, 4, 4, 4, 7, 7, 7, 7, 7), 
  Page = c("A", "B", "C", "D", "A", "C", "B", "B", "C", "D", "A", "D")),
  .Names = c("sess_id", "Page"),
  row.names = c(NA, -12L),
  class = "data.frame")


fun <- function(x, width) {
  y1 <- zoo::rollapplyr(x, width = seq(width), paste, collapse = "")[1:(width - 1L)]
  y2 <- zoo::rollapplyr(x, width = width, paste, collapse = "")
  c(y1, y2)
}

sp <- split(df$Page, df$sess_id)
l <- 5L

df$Path <- unlist(lapply(sp, fun, width = l))
df$Start <- substr(df$Path, 1, 1)
df$End <- substring(df$Path, nchar(df$Path))
df
#>    sess_id Page  Path Start End
#> 1        4    A     A     A   A
#> 2        4    B    AB     A   B
#> 3        4    C   ABC     A   C
#> 4        4    D  ABCD     A   D
#> 5        4    A ABCDA     A   A
#> 6        4    C BCDAC     B   C
#> 7        4    B CDACB     C   B
#> 8        7    B     B     B   B
#> 9        7    C    BC     B   C
#> 10       7    D   BCD     B   D
#> 11       7    A  BCDA     B   A
#> 12       7    D BCDAD     B   D

Created on 2022-11-08 with reprex v2.0.2

CodePudding user response:

You can use accumulate substr like below

library(dplyr)
library(purrr)

df %>%
  group_by(sess_id) %>%
  mutate(Path = accumulate(Page, paste0)) %>%
  ungroup() %>%
  mutate(
    Path = substr(Path, nchar(Path) - 4, nchar(Path)),
    Start = substr(Path, 1, 1),
    End = Page
  )

which gives

# A tibble: 12 × 5
   sess_id Page  Path  Start End  
     <dbl> <chr> <chr> <chr> <chr>
 1       4 A     A     A     A
 2       4 B     AB    A     B
 3       4 C     ABC   A     C
 4       4 D     ABCD  A     D
 5       4 A     ABCDA A     A
 6       4 C     BCDAC B     C
 7       4 B     CDACB C     B
 8       7 B     B     B     B
 9       7 C     BC    B     C
10       7 D     BCD   B     D
11       7 A     BCDA  B     A
12       7 D     BCDAD B     D

CodePudding user response:

The following works and uses tidyverse. Path is first created as all letters within each sess_id stuck together. Then take the first to nth letters, where n is the row number. Then take between 0 and 5 chars from the end of string.

The Start and End are just the first and last letters of Path.

At the end we set Path, Start and End to "" when the length of Path is one.

df <- df %>% 
  group_by(sess_id) %>% 
  mutate(Path = paste0(Page , collapse = "") %>% 
           str_sub( 1 , row_number()) %>% 
           str_extract( "\\w{0,5}$"),
         
         Start = str_extract(Path , "^\\w"), 
         End = str_extract(Path , "\\w$")) %>% 
  mutate(across(c(Path, Start, End), ~if_else(str_length(Path)==1 , "" , .)))

> df
# A tibble: 12 x 5
# Groups:   sess_id [2]
   sess_id Page  Path    Start End  
     <dbl> <chr> <chr>   <chr> <chr>
 1       4 A     ""      ""    ""   
 2       4 B     "AB"    "A"   "B"  
 3       4 C     "ABC"   "A"   "C"  
 4       4 D     "ABCD"  "A"   "D"  
 5       4 A     "ABCDA" "A"   "A"  
 6       4 C     "BCDAC" "B"   "C"  
 7       4 B     "CDACB" "C"   "B"  
 8       7 B     ""      ""    ""   
 9       7 C     "BC"    "B"   "C"  
10       7 D     "BCD"   "B"   "D"  
11       7 A     "BCDA"  "B"   "A"  
12       7 D     "BCDAD" "B"   "D" 
  • Related