df <- tibble(year = 1951:2000,
val = rnorm(50))
Assuming the df
above, I want to add an extra column (say cond
) to the tibble such that its value depends on the two previous rows of the column val.
In other words, if (val[i-1] & val[i-2]) < 0 , give the value 1 to cond[i], otherwise zero.
CodePudding user response:
You can use lag
, which takes a second argument telling the function by how many steps you wish to lag the vector. So if lag(val) < 0
and lag(val, 2) < 0
your criterion is met. I have wrapped this in tidyr::replace_na
, assuming that you wanted 0 for the first two rows where the criteria are not defined.
set.seed(3)
df <- tibble(year = 1951:2000, val = rnorm(50))
df %>% mutate(cond = tidyr::replace_na(as.numeric(lag(val) < 0 & lag(val, 2) < 0), 0))
#> # A tibble: 50 x 3
#> year val cond
#> <int> <dbl> <dbl>
#> 1 1951 -0.962 0
#> 2 1952 -0.293 0
#> 3 1953 0.259 1
#> 4 1954 -1.15 0
#> 5 1955 0.196 0
#> 6 1956 0.0301 0
#> 7 1957 0.0854 0
#> 8 1958 1.12 0
#> 9 1959 -1.22 0
#> 10 1960 1.27 0
#> # ... with 40 more rows
CodePudding user response:
There's a for loop for this, following your description. I'm not sure about what to do with the first two values, thus I included the same values in df$val
. A solution to learn about loops in R, but more verbose than the great solution offered by Allan Cameron:
library(dplyr)
df$cond = 1:nrow(df)
val <- df$val
for (i in seq_along(val)) {
if (i == 1 | i == 2) {
df$cond[i] <- val[i]
} else if (val[i -1] < 0 & val[i -2] < 0) {
df$cond[i] <- 1
} else {
df$cond[i] <- 0
}
}
Output
> df
# A tibble: 50 × 3
year val cond
<int> <dbl> <dbl>
1 1951 -0.560 -0.560
2 1952 -0.230 -0.230
3 1953 1.56 1
4 1954 0.0705 0
5 1955 0.129 0
6 1956 1.72 0
7 1957 0.461 0
8 1958 -1.27 0
9 1959 -0.687 0
10 1960 -0.446 1
# … with 40 more rows
Data
set.seed(123)
df <- tibble(year = 1951:2000, val = rnorm(50))