Home > Software design >  How can I get specific data out of a list (with lists) under a certain condition
How can I get specific data out of a list (with lists) under a certain condition


I have lists in lists and would like to take the activities that occur just within the first 600 seconds (journey time < 600). The "journey time" starts with 0 and adds the time of corresponding activity "code" on top.



So for example [["journey_time"]] could look like this 0, 46.7, 79.4, ...., 1800.

[["code"]] looks like StartPage, ClickItem1, ScrollItem1, ..., ClosePage.

"i" are the customers here.

I tried it for each customer alone, but I, of course, would prefer an iterative process with loops.

Thank you in advance! Appreciate it much!


CodePudding user response:

This should work.

homepage1 <- list(
  customer_data = list(
    activity_list = list(
      list(journey_time = 0, code = "StartPage"),
      list(journey_time = 46.7, code = "ClickItem1"),
      list(journey_time = 79.4, code = "ScrollItem1"),
      list(journey_time = 1800, code = "ClosePage")

# create an empty list to store the activity codes
activity_codes <- list()

# iterate over each element in the list of lists
for (i in 1:length(homepage1[["customer_data"]][["activity_list"]])) {
  # check if the journey time is less than 600
  if (homepage1[["customer_data"]][["activity_list"]][[i]][["journey_time"]] < 600) {
    # if it is, add the activity code to the list
    activity_codes <- c(activity_codes, homepage1[["customer_data"]][["activity_list"]][[i]][["code"]])

# print the list of activity codes

Edit: Updated from python to r as mentioned by @Maël

CodePudding user response:

Your data structure looks like a JSON -> R conversion, so in case you have the original JSON you may not need to convert it to JSON first.

Having said this, you can use fromJSON with flatten = TRUE to get the relevant data in a nice data.frame format, which makes the processing much easier:

(mdat <- fromJSON(toJSON(homepage1), flatten = TRUE))
# $customer_data
# $customer_data$activity_list
#   journey_time        code
# 1            0   StartPage
# 2         46.7  ClickItem1
# 3         79.4 ScrollItem1
# 4         1800   ClosePage

So all you need to do is to use cumsum on column journey_time (assuming that each timing measures the time spent on the element since the last visit and not from the beginning, if the later is true you do not need cumsum) to get cumulative timings and use that as a filter:

idx <- cumsum(mdat$customer_data$activity_list$journey_time) <= 600
# [1] "StartPage"   "ClickItem1"  "ScrollItem1"
  • Related