Hi swarm intelligence,
I am working on a dataset with customer journeys. I want to aggregate the same activities that occur subsequently to one activity and also summarize the time they spent on the given activities.
Please find the first line of my current data frame in the following.
User | Activities | Time |
---|---|---|
1 | c(“openPage1“, “writeText“, “writeText“, “writeText“, “closePage1“) | c(10, 40, 30, 20, 15) |
The output however should look like this:
User | Activities | Time |
---|---|---|
1 | c(“openPage1“, “writeText“, “closePage1“) | c(10, 90, 15) |
Could you please tell me how to be able to aggregate in a vector?
Thank you so much!
Marius
CodePudding user response:
Your data structure:
dat <- structure(list(Activities = list(c("openPage1", "writeText",
"writeText", "writeText", "closePage1")), Time = list(c(10, 40,
30, 20, 15))), row.names = c(NA, -1L), class = "data.frame")
We can do:
agg <- Map(function (v, g) tapply(v, g, FUN = sum), dat$Time, dat$Activities)
dat$Activities <- lapply(agg, names)
dat$Time <- lapply(agg, unname)
dat
# Activities Time
#1 closePage1, openPage1, writeText 15, 10, 90
CodePudding user response:
An idea via base R can be to unlist and aggregate, i.e.
aggregate(Time ~ Activities,
transform(data.frame(sapply(dat, unlist)), Time = as.numeric(Time)),
FUN = sum)
Activities Time
1 closePage1 15
2 openPage1 10
3 writeText 90