How to add a time duration variable in r?-CodePudding

I want to create time duration variable in r but I haven't figure it out. My data looks like this:

company <- rep(c("A", "B"), each=8)
year <- rep(1980:1987, 2)
value <- c(NA,0,0,0,1,0,-1,1,NA,-3,NA,0,0,1,0,0)
data <- data.frame(company, year, value)

I want it like this

company <- rep(c("A", "B"), each=8)
year <- rep(1980:1987, 2)
value <- c(NA,0,0,0,1,0,-1,1,NA,-3,NA,0,0,1,0,0)
timeduration <- c(NA,1,2,3,1,2,1,1,NA,1,NA,1,2,1,2,3)
data1 <- data.frame(company, year, value, timeduration)

Zero(s) has two meanings in the value variable. If I use it after NA, I have to start counting from 1, but if 0 comes from after any value, it means that change in the value is constant. Therefore, I have to continue to count even if the value is changed to 0. I show this situation in the second data (data1). (Also, data should be grouped by considering companies). Thank you for your efforts!

CodePudding user response：

We define a reset variable corresponding to OP's explanation of zeros in value, then build groups accordingly (whenever the time count reset is TRUE, we start a new group, with cumsum()).

Then we can derive timeduration as a difference between year and first value of year within the group.

library(dplyr)

data %>% 
  group_by(company) %>% 
  mutate(reset = (value != 0) | (value == 0 & is.na(lag(value)))) %>% 
  group_by(company, group = cumsum(ifelse(is.na(reset), 0, reset))) %>% 
  mutate(timeduration = ifelse(is.na(value), NA, year - first(year)   1)) %>% 
  ungroup() %>% 
  select(- reset, - group) 

#> # A tibble: 16 × 4
#>    company  year value timeduration
#>    <chr>   <int> <dbl>        <dbl>
#>  1 A        1980    NA           NA
#>  2 A        1981     0            1
#>  3 A        1982     0            2
#>  4 A        1983     0            3
#>  5 A        1984     1            1
#>  6 A        1985     0            2
#>  7 A        1986    -1            1
#>  8 A        1987     1            1
#>  9 B        1980    NA           NA
#> 10 B        1981    -3            1
#> 11 B        1982    NA           NA
#> 12 B        1983     0            1
#> 13 B        1984     0            2
#> 14 B        1985     1            1
#> 15 B        1986     0            2
#> 16 B        1987     0            3

CodePudding user response：

Write a function that controls what you want to do with variable value, for example:

company <- rep(c("A", "B"), each=8)
year <- rep(1980:1987, 2)
value <- c(NA,0,0,0,1,0,-1,1,NA,-3,NA,0,0,1,0,0)
timeduration <- c(NA,1,2,3,1,2,1,1,NA,1,NA,1,2,1,2,3)
data1 <- data.frame(company, year, value, timeduration)

library(tidyverse)

calc_td <- function(x) {
  res <- rep(0, length(x))
  tmp <- x
  tmp[is.na(x)] <- max(x, na.rm = TRUE)   1
  for (i in seq_along(tmp)) {
    if (i == 1L) {
      res[i] <- 0
    } else {
      if ((tmp[i] != tmp[i - 1]) & (tmp[i] != 0)) {res[i] <- 1} else {res[i] <- res[i - 1]   1}
    }
  }
  res[is.na(x)] <- NA
  res
}

data1 %>%
  group_by(company) %>%
  mutate(timeduration2 = calc_td(value)) %>%
  ungroup()