I want to create time duration variable in r but I haven't figure it out. My data looks like this:
company <- rep(c("A", "B"), each=8)
year <- rep(1980:1987, 2)
value <- c(NA,0,0,0,1,0,-1,1,NA,-3,NA,0,0,1,0,0)
data <- data.frame(company, year, value)
I want it like this
company <- rep(c("A", "B"), each=8)
year <- rep(1980:1987, 2)
value <- c(NA,0,0,0,1,0,-1,1,NA,-3,NA,0,0,1,0,0)
timeduration <- c(NA,1,2,3,1,2,1,1,NA,1,NA,1,2,1,2,3)
data1 <- data.frame(company, year, value, timeduration)
Zero(s) has two meanings in the value variable. If I use it after NA, I have to start counting from 1, but if 0 comes from after any value, it means that change in the value is constant. Therefore, I have to continue to count even if the value is changed to 0. I show this situation in the second data (data1). (Also, data should be grouped by considering companies). Thank you for your efforts!
CodePudding user response:
We define a reset
variable corresponding to OP's explanation of zeros in value
, then build group
s accordingly (whenever the time count reset
is TRUE
, we start a new group
, with cumsum()
).
Then we can derive timeduration
as a difference between year
and first value of year
within the group
.
library(dplyr)
data %>%
group_by(company) %>%
mutate(reset = (value != 0) | (value == 0 & is.na(lag(value)))) %>%
group_by(company, group = cumsum(ifelse(is.na(reset), 0, reset))) %>%
mutate(timeduration = ifelse(is.na(value), NA, year - first(year) 1)) %>%
ungroup() %>%
select(- reset, - group)
#> # A tibble: 16 × 4
#> company year value timeduration
#> <chr> <int> <dbl> <dbl>
#> 1 A 1980 NA NA
#> 2 A 1981 0 1
#> 3 A 1982 0 2
#> 4 A 1983 0 3
#> 5 A 1984 1 1
#> 6 A 1985 0 2
#> 7 A 1986 -1 1
#> 8 A 1987 1 1
#> 9 B 1980 NA NA
#> 10 B 1981 -3 1
#> 11 B 1982 NA NA
#> 12 B 1983 0 1
#> 13 B 1984 0 2
#> 14 B 1985 1 1
#> 15 B 1986 0 2
#> 16 B 1987 0 3
CodePudding user response:
Write a function that controls what you want to do with variable value, for example:
company <- rep(c("A", "B"), each=8)
year <- rep(1980:1987, 2)
value <- c(NA,0,0,0,1,0,-1,1,NA,-3,NA,0,0,1,0,0)
timeduration <- c(NA,1,2,3,1,2,1,1,NA,1,NA,1,2,1,2,3)
data1 <- data.frame(company, year, value, timeduration)
library(tidyverse)
calc_td <- function(x) {
res <- rep(0, length(x))
tmp <- x
tmp[is.na(x)] <- max(x, na.rm = TRUE) 1
for (i in seq_along(tmp)) {
if (i == 1L) {
res[i] <- 0
} else {
if ((tmp[i] != tmp[i - 1]) & (tmp[i] != 0)) {res[i] <- 1} else {res[i] <- res[i - 1] 1}
}
}
res[is.na(x)] <- NA
res
}
data1 %>%
group_by(company) %>%
mutate(timeduration2 = calc_td(value)) %>%
ungroup()