Home > OS >  Mutate new column conditioned on values of another column over multiple rows
Mutate new column conditioned on values of another column over multiple rows

Time:02-19

df <- tibble(year = 1951:2000,
             val = rnorm(50))

Assuming the df above, I want to add an extra column (say cond) to the tibble such that its value depends on the two previous rows of the column val.

In other words, if (val[i-1] & val[i-2]) < 0 , give the value 1 to cond[i], otherwise zero.

CodePudding user response:

You can use lag, which takes a second argument telling the function by how many steps you wish to lag the vector. So if lag(val) < 0 and lag(val, 2) < 0 your criterion is met. I have wrapped this in tidyr::replace_na, assuming that you wanted 0 for the first two rows where the criteria are not defined.

set.seed(3)

df <- tibble(year = 1951:2000, val = rnorm(50))

df %>% mutate(cond = tidyr::replace_na(as.numeric(lag(val) < 0 & lag(val, 2) < 0), 0))

#> # A tibble: 50 x 3
#>     year     val  cond
#>    <int>   <dbl> <dbl>
#>  1  1951 -0.962      0
#>  2  1952 -0.293      0
#>  3  1953  0.259      1
#>  4  1954 -1.15       0
#>  5  1955  0.196      0
#>  6  1956  0.0301     0
#>  7  1957  0.0854     0
#>  8  1958  1.12       0
#>  9  1959 -1.22       0
#> 10  1960  1.27       0
#> # ... with 40 more rows

CodePudding user response:

There's a for loop for this, following your description. I'm not sure about what to do with the first two values, thus I included the same values in df$val. A solution to learn about loops in R, but more verbose than the great solution offered by Allan Cameron:

library(dplyr)

df$cond = 1:nrow(df)

val <- df$val

for (i in seq_along(val)) {
    if (i == 1 | i == 2) {
        df$cond[i] <- val[i]
    } else if (val[i -1] < 0 & val[i -2] < 0) {
        df$cond[i] <- 1
    } else {
        df$cond[i] <- 0
    }
}

Output

> df
# A tibble: 50 × 3
    year     val   cond
   <int>   <dbl>  <dbl>
 1  1951 -0.560  -0.560
 2  1952 -0.230  -0.230
 3  1953  1.56    1    
 4  1954  0.0705  0    
 5  1955  0.129   0    
 6  1956  1.72    0    
 7  1957  0.461   0    
 8  1958 -1.27    0    
 9  1959 -0.687   0    
10  1960 -0.446   1    
# … with 40 more rows

Data

set.seed(123)

df <- tibble(year = 1951:2000, val = rnorm(50))

  • Related