I have a dataframe with an event log from a tracking device. It notes the type of the event (stay at place or track/movement), the name of the event (home/work/other for stays, traffic mode for tracks), the distance and the start datetime of the event. Multiple tracks / mode changes can be logged in between stays at places.
type = c('place', 'track', 'track', 'place', 'track', 'place')
name = c('home', 'walk', 'bicycle', 'work', 'walk', 'other')
distance = c(0, 1120, 2300, 0, 4000, 0)
startdate = c('2021-12-19 10:00:00', '2021-12-19 11:00:00', '2021-12-19 11:05:00', '2021-12-19 11:15:00', '2021-12-19 12:00:00', '2021-12-19 12:30:00')
df = data.frame(type, name ,distance, startdate)
I would like to be able to assign a new attribute to all tracks indicating from and to which place the track was logged, e.g. what the last and next noted place was, excluding all tracks that might follow each other:
fromto = c(NA, 'home-work', 'home-work', NA, 'work-other', NA)
df = data.frame(type, name ,distance, startdate, fromto)
With simpler setups I would maybe use dplyr's lag function or a cumsum of sorts, however as multiple tracks can follow each other, I would need to look up the "name" attribute at preceding and following rows.
The data set typically will be vast ; is there an R solution that doesn't require the implementation of a loop ?
CodePudding user response:
Coming at it from a slightly different angle, you can add an id column, filter for all the times type == "place" and then use mutate, paste() & lead() to create the fromto column, save this to a data frame and then rejoin this to your original df. You can then tidyr::fill() the fromto column, which will allow you to make groups.
library(dplyr)
library(tidyr)
df <- df %>% mutate(id = row_number())
changes <- df %>%
filter(type == "place") %>%
mutate(fromto = paste(name, lead(name, 1), sep = "-")) %>%
select(id, fromto)
df <- df %>%
left_join(changes, by = "id") %>%
fill(fromto)
You could then filter for type == "track":
df %>% filter(type == "track")
CodePudding user response:
A possible solution:
library(tidyverse)
type = c('place', 'track', 'track', 'place', 'track', 'place')
name = c('home', 'walk', 'bicycle', 'work', 'walk', 'other')
distance = c(0, 1120, 2300, 0, 4000, 0)
startdate = c('2021-12-19 10:00:00', '2021-12-19 11:00:00', '2021-12-19 11:05:00', '2021-12-19 11:15:00', '2021-12-19 12:00:00', '2021-12-19 12:30:00')
df = data.frame(type, name ,distance, startdate)
df %>%
mutate(id = row_number()) %>%
filter(type == "place") %>%
mutate(fromto = str_c(lag(name), name, sep="_")) %>%
full_join(mutate(df, id = row_number())) %>%
arrange(id) %>%
fill(fromto, .direction = "up") %>%
mutate(fromto = if_else(type == "place", NA_character_, fromto), id=NULL)
#> type name distance startdate fromto
#> 1 place home 0 2021-12-19 10:00:00 <NA>
#> 2 track walk 1120 2021-12-19 11:00:00 home_work
#> 3 track bicycle 2300 2021-12-19 11:05:00 home_work
#> 4 place work 0 2021-12-19 11:15:00 <NA>
#> 5 track walk 4000 2021-12-19 12:00:00 work_other
#> 6 place other 0 2021-12-19 12:30:00 <NA>