I have lists in lists and would like to take the activities that occur just within the first 600 seconds (journey time < 600). The "journey time" starts with 0 and adds the time of corresponding activity "code" on top.
homepage1[["customer_data"]][["activity_list"]][[i]][["journey_time"]]
homepage1[["customer_data"]][["activity_list"]][[i]][["code"]]
So for example [["journey_time"]] could look like this 0, 46.7, 79.4, ...., 1800.
[["code"]] looks like StartPage, ClickItem1, ScrollItem1, ..., ClosePage.
"i" are the customers here.
I tried it for each customer alone, but I, of course, would prefer an iterative process with loops.
Thank you in advance! Appreciate it much!
Marius
CodePudding user response:
This should work.
homepage1 <- list(
customer_data = list(
activity_list = list(
list(journey_time = 0, code = "StartPage"),
list(journey_time = 46.7, code = "ClickItem1"),
list(journey_time = 79.4, code = "ScrollItem1"),
list(journey_time = 1800, code = "ClosePage")
)
)
)
# create an empty list to store the activity codes
activity_codes <- list()
# iterate over each element in the list of lists
for (i in 1:length(homepage1[["customer_data"]][["activity_list"]])) {
# check if the journey time is less than 600
if (homepage1[["customer_data"]][["activity_list"]][[i]][["journey_time"]] < 600) {
# if it is, add the activity code to the list
activity_codes <- c(activity_codes, homepage1[["customer_data"]][["activity_list"]][[i]][["code"]])
}
}
# print the list of activity codes
print(activity_codes)
Edit: Updated from python to r as mentioned by @Maël
CodePudding user response:
Your data structure looks like a JSON -> R
conversion, so in case you have the original JSON
you may not need to convert it to JSON
first.
Having said this, you can use fromJSON
with flatten = TRUE
to get the relevant data in a nice data.frame
format, which makes the processing much easier:
library(jsonlite)
(mdat <- fromJSON(toJSON(homepage1), flatten = TRUE))
# $customer_data
# $customer_data$activity_list
# journey_time code
# 1 0 StartPage
# 2 46.7 ClickItem1
# 3 79.4 ScrollItem1
# 4 1800 ClosePage
So all you need to do is to use cumsum
on column journey_time
(assuming that each timing measures the time spent on the element since the last visit and not from the beginning, if the later is true you do not need cumsum
) to get cumulative timings and use that as a filter:
idx <- cumsum(mdat$customer_data$activity_list$journey_time) <= 600
unlist(mdat$customer_data$activity_list$code[idx])
# [1] "StartPage" "ClickItem1" "ScrollItem1"