Suppose I have this dataframe, df, in R:
UserID <- c(1, 1, 1, 5, 5, 7, 7, 9, 9, 9)
PathID <- c(1,2,3,1,2,1,2,1,2,3)
Page <- c("home", "about", "services", "home", "pricing",
"pricing", "home", "about", "home", "services")
df <- data.frame(UserID, PathID, Page)
I am trying to write a code that would return the sequence (along with UserID and PathID) where the user visits the 'home' page, but not the 'about' page subsequently. My output should look like this:
UserID <- c(5, 5, 7, 7, 9, 9, 9)
PathID <- c(1,2,1,2,1,2,3)
Page <- c("home", "pricing", "pricing", "home", "about", "home", "services")
df1 <- data.frame(UserID, PathID, Page)
I would really appreciate some help here.
CodePudding user response:
With a couple of filter
ing conditions, you can remove the all group (!any
) if it has a sequence of "home", "about"
.
library(dplyr)
df %>%
group_by(UserID) %>%
filter(!any(Page == "about" & lag(Page, default = "nothome") == "home"))
UserID PathID Page
1 5 1 home
2 5 2 pricing
3 7 1 pricing
4 7 2 home
5 9 1 about
6 9 2 home
7 9 3 services
CodePudding user response:
An option with data.table
library(data.table)
setDT(df)[df[, .I[!any(Page == "about" &
shift(Page) == "home", na.rm = TRUE)], UserID]$V1]
UserID PathID Page
<num> <num> <char>
1: 5 1 home
2: 5 2 pricing
3: 7 1 pricing
4: 7 2 home
5: 9 1 about
6: 9 2 home
7: 9 3 services