Home > Blockchain >  Populating one column based on two columns in R
Populating one column based on two columns in R

Time:06-24

I have the below dataset, and I am trying to create a more meaningful path.

Row# Session Click Page
1 123 Enter Pg1
2 123 phpbutton Pg1
3 123 Enter Pg2
4 123 Enter Pg3
5 123 Form1 Pg3
6 123 Form2 Pg3
7 123 Form1 Pg3
8 123 Form1 Pg3
9 123 abcbutton Pg3
10 123 Enter Pg1
11 123 xyzselect Pg1
12 123 Enter Pg4
13 123 Enter Pg3
14 123 Back Pg3
15 123 Enter Pg1

I would like the outcome to look this:

Session Activity
123 Pg1
123 phpbutton
123 Pg2
123 Pg3
123 Form1
123 Form2
123 Form1
123 abcbutton
123 Pg1
123 xyzselect
123 Pg4
123 Pg3
123 Back
123 Pg1

If the Click column has Enter, then the Activity column should show the Page. But, if the subsequent page is equal to the previous page, then the Activity column should show the value from the Click column. For instance, row# 1 and 2 have the same Page numbers, so I would like the Activity column to show, Pg1, then, phpbutton. But, if the Click column has two or more subsequent same values, as seen in Row# 7 and 8, I would like the Activity column to show just one entry of Form 1.

Thanks a lot.

CodePudding user response:

Try this

df |> group_by(Session) |> 
mutate(Activity = case_when(Click == "Enter" ~ Page , 
lag(Page) == Page ~ Click)) |> select(Session , Activity)

CodePudding user response:

library(dplyr)

dat$activity <- ifelse(dat$click == "Enter" & (lag(dat$page) != dat$page | is.na(lag(dat$page))), dat$page,
       ifelse(lag(dat$page) == dat$page, dat$click, NA))

   row session     click page  activity
1    1     123     Enter  Pg1       Pg1
2    2     123 phpbutton  Pg1 phpbutton
3    3     123     Enter  Pg2       Pg2
4    4     123     Enter  Pg3       Pg3
5    5     123     Form1  Pg3     Form1
6    6     123     Form2  Pg3     Form2
7    7     123     Form1  Pg3     Form1
8    8     123     Form1  Pg3     Form1
9    9     123 abcbutton  Pg3 abcbutton
10  10     123     Enter  Pg1       Pg1
11  11     123 xyzselect  Pg1 xyzselect
12  12     123     Enter  Pg4       Pg4
13  13     123     Enter  Pg3       Pg3
14  14     123      Back  Pg3      Back
15  15     123     Enter  Pg1       Pg1

You could remove duplicate, consecutive rows like this:

dat[cumsum(rle(paste0(dat$session, dat$click, dat$page, dat$activity))$length),]

   row session     click page  activity
1    1     123     Enter  Pg1       Pg1
2    2     123 phpbutton  Pg1 phpbutton
3    3     123     Enter  Pg2       Pg2
4    4     123     Enter  Pg3       Pg3
5    5     123     Form1  Pg3     Form1
6    6     123     Form2  Pg3     Form2
8    8     123     Form1  Pg3     Form1
9    9     123 abcbutton  Pg3 abcbutton
10  10     123     Enter  Pg1       Pg1
11  11     123 xyzselect  Pg1 xyzselect
12  12     123     Enter  Pg4       Pg4
13  13     123     Enter  Pg3       Pg3
14  14     123      Back  Pg3      Back
15  15     123     Enter  Pg1       Pg1

data:

structure(list(row = 1:15, session = c(123L, 123L, 123L, 123L, 
123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L
), click = c("Enter", "phpbutton", "Enter", "Enter", "Form1", 
"Form2", "Form1", "Form1", "abcbutton", "Enter", "xyzselect", 
"Enter", "Enter", "Back", "Enter"), page = c("Pg1", "Pg1", "Pg2", 
"Pg3", "Pg3", "Pg3", "Pg3", "Pg3", "Pg3", "Pg1", "Pg1", "Pg4", 
"Pg3", "Pg3", "Pg1")), row.names = c(NA, -15L), class = "data.frame")
  • Related