Calculating the time length of each binary/boolean column as reference-CodePudding

I have two columns. One is listed as True/False for a series of data. The entire dataset also has a timestep column. I want to write code that can read when the Boolean column changes to true, the time is calculated from the timestamp column until the Boolean changes back to false. And repeat this for the entire series, and bin the times in a data frame for a histogram. Apologies for the poor attempt, I really don't know where to start. Note that the running column is listed as characters -- perhaps I need to convert to Boolean for this to work?

running  <- c("t","t","f","f","t","f","t","t")
time <- c("2022-01-01 00:00:10", "2022-01-01 00:00:20","2022-01-01 00:00:30","2022-01-01 00:00:40","2022-01-01 00:00:50","2022-01-01 00:01:00","2022-01-01 00:01:10","2022-01-01 00:01:20")
dataset <- data.frame(time, running)

datafinal <- data.frame()    
for (i in dataset){
   if running == f,
   result <- sum(i:n)
datafinal <- c(datafinal, result)
}

CodePudding user response：

Converting running column to boolean and working with for-loop is a way. Also, you can operate in the dataframe. You already have one! This is a solution working with tidyverse library and some date operations thanks to lubridate library. I encourage you to learn to work with these libraries for this kind of problem.

rleid() function from data.table library add 1 everytime a value in the target column running changes.

running  <- c("t","t","f","f","t","f","t","t")
time <- c("2022-01-01 00:00:10", "2022-01-01 00:00:20","2022-01-01 00:00:30","2022-01-01 00:00:40","2022-01-01 00:00:50","2022-01-01 00:01:00","2022-01-01 00:01:10","2022-01-01 00:01:20")
dataset <- data.frame(time, running)

# times to date time object
dataset$time = lubridate::ymd_hms(dataset$time,tz="UTC")

library(tidyverse)
solution = dataset %>% 
  mutate(Grp=data.table::rleid(running)) %>% # rows in the same state before change get same value
  group_by(Grp) %>% # rows in the same state are grouped together
  slice(1) %>% # keep first row
  ungroup %>%  # you don't need grouping anymore
  mutate(timeLength = difftime(time, lag(time), units="secs")) 
  # calculate the differences between a row and previous one (lag(n=1))

Output:

# A tibble: 5 x 4
  time                running   Grp timeLength
  <dttm>              <chr>   <int> <drtn>    
1 2022-01-01 00:00:10 t           1 NA secs   
2 2022-01-01 00:00:30 f           2 20 secs   
3 2022-01-01 00:00:50 t           3 20 secs   
4 2022-01-01 00:01:00 f           4 10 secs   
5 2022-01-01 00:01:10 t           5 10 secs

If you want to get rid of the first NA row, just add to the pipeline %>% filter(!is.na(timeLength)).

Update to add how you could do it with for-loop and nested if-else. But note the code is longer and more difficult to track.

dataset$time = lubridate::ymd_hms(dataset$time,tz="UTC")
# empty array for tracking changes in rows
current = c()
# datafinal  empty dataframe
datafinal  = data.frame()
# better working with the rows index
for (i in seq(nrow(dataset))){
  # extract current vale of running
  current = c(current,dataset[i,]$running)
  if (i>1){ # we can't operate with first row, right?
    if (current[i] == current[i-1]){
      next # pass iteration if they keep in same state (true or false)
    }
    else {  # different state? let's operate
      result = difftime(dataset[i,]$time, previous_time, units="secs")
    }
    # (note: if 'next' jump in if-loop this part doesn't jump)
    
    # create the outcome row for iteration
    new_row = cbind(dataset[i,],result)
    # add row to final dataframe
    datafinal = rbind(datafinal,new_row)
  }
  # keep first time of state when it changes or we initiate the loop
  previous_time = dataset[i,]$time 
}