I have two columns. One is listed as True/False for a series of data. The entire dataset also has a timestep column. I want to write code that can read when the Boolean column changes to true, the time is calculated from the timestamp column until the Boolean changes back to false. And repeat this for the entire series, and bin the times in a data frame for a histogram. Apologies for the poor attempt, I really don't know where to start. Note that the running column is listed as characters -- perhaps I need to convert to Boolean for this to work?
running <- c("t","t","f","f","t","f","t","t")
time <- c("2022-01-01 00:00:10", "2022-01-01 00:00:20","2022-01-01 00:00:30","2022-01-01 00:00:40","2022-01-01 00:00:50","2022-01-01 00:01:00","2022-01-01 00:01:10","2022-01-01 00:01:20")
dataset <- data.frame(time, running)
datafinal <- data.frame()
for (i in dataset){
if running == f,
result <- sum(i:n)
datafinal <- c(datafinal, result)
}
CodePudding user response:
Converting running
column to boolean and working with for-loop is a way. Also, you can operate in the dataframe. You already have one! This is a solution working with tidyverse
library and some date operations thanks to lubridate
library. I encourage you to learn to work with these libraries for this kind of problem.
rleid()
function from data.table
library add 1 everytime a value in the target column running
changes.
running <- c("t","t","f","f","t","f","t","t")
time <- c("2022-01-01 00:00:10", "2022-01-01 00:00:20","2022-01-01 00:00:30","2022-01-01 00:00:40","2022-01-01 00:00:50","2022-01-01 00:01:00","2022-01-01 00:01:10","2022-01-01 00:01:20")
dataset <- data.frame(time, running)
# times to date time object
dataset$time = lubridate::ymd_hms(dataset$time,tz="UTC")
library(tidyverse)
solution = dataset %>%
mutate(Grp=data.table::rleid(running)) %>% # rows in the same state before change get same value
group_by(Grp) %>% # rows in the same state are grouped together
slice(1) %>% # keep first row
ungroup %>% # you don't need grouping anymore
mutate(timeLength = difftime(time, lag(time), units="secs"))
# calculate the differences between a row and previous one (lag(n=1))
Output:
# A tibble: 5 x 4
time running Grp timeLength
<dttm> <chr> <int> <drtn>
1 2022-01-01 00:00:10 t 1 NA secs
2 2022-01-01 00:00:30 f 2 20 secs
3 2022-01-01 00:00:50 t 3 20 secs
4 2022-01-01 00:01:00 f 4 10 secs
5 2022-01-01 00:01:10 t 5 10 secs
If you want to get rid of the first NA row, just add to the pipeline %>% filter(!is.na(timeLength))
.
Update to add how you could do it with for-loop and nested if-else. But note the code is longer and more difficult to track.
dataset$time = lubridate::ymd_hms(dataset$time,tz="UTC")
# empty array for tracking changes in rows
current = c()
# datafinal empty dataframe
datafinal = data.frame()
# better working with the rows index
for (i in seq(nrow(dataset))){
# extract current vale of running
current = c(current,dataset[i,]$running)
if (i>1){ # we can't operate with first row, right?
if (current[i] == current[i-1]){
next # pass iteration if they keep in same state (true or false)
}
else { # different state? let's operate
result = difftime(dataset[i,]$time, previous_time, units="secs")
}
# (note: if 'next' jump in if-loop this part doesn't jump)
# create the outcome row for iteration
new_row = cbind(dataset[i,],result)
# add row to final dataframe
datafinal = rbind(datafinal,new_row)
}
# keep first time of state when it changes or we initiate the loop
previous_time = dataset[i,]$time
}