Home > OS >  dplyr: Populating a column based on some condition by iterating over a vector
dplyr: Populating a column based on some condition by iterating over a vector

Time:05-10

I am looking for a tidyverse solution of populating a column based on some condition by iterating over a vector. I would like a tidy solution, as my data is large and nested, unlike the following minimal example.

Column prev represents previous state of a system, new represents the new state. prev and new are always in sequence, but some of them may loop, like in the example below.
0 and 100 values are start and finish states, the states represented here as letters are the important ones.

For certain combination of prev and new (i.e. satisfying both prev_condition and new_condition), I need to iterate over a larger vector vec (it has more elements than possible combinations of state) and place the values in order in column to_do

I would like to add that combinations of prev and new do not need to be unique, this is why I added column change to indicate every state change that took place.

Here is a minimal example and a solution in base R. I am really hoping for an elegant tidyverse solution. Thank you.

EDIT: I have received two wonderful answers, but my mistake was to provide an example vector vec containing ascending numeric values. I have edited my question with a more apropiate vec.

# Minimal reproducible data
df <- data.frame(prev = c("0", rep(letters[1:3], 2), rep(letters[4:10], 3)),
                 new = c(rep(letters[1:3], 2), rep(letters[4:10], 3), "100"),
                 change = 1:28,
                 to_do = rep(NA, 28))

# Vector for iteration
set.seed(101)
vec <- sample(LETTERS, size = 30, replace = TRUE)
vec
#>  [1] "I" "Y" "N" "W" "Q" "Z" "V" "C" "C" "I" "C" "C" "B" "T" "U" "Q" "N" "L" "A"
#> [20] "M" "F" "X" "Z" "P" "U" "J" "Z" "K" "U" "Z"

# Conditions
prev_condition <- c(letters[4:6])   # prev state must be any of: "d" "e" "f"
new_condition <- c(letters[5:7])    # new state must be any of: "e" "f" "g"

# base R solution
n_row <- length(df[df$prev %in% prev_condition & df$new %in% new_condition, "to_do"])
df[df$prev %in% prev_condition & df$new %in% new_condition, "to_do"] <- vec[1:n_row]

df
#>    prev new change to_do
#> 1     0   a      1  <NA>
#> 2     a   b      2  <NA>
#> 3     b   c      3  <NA>
#> 4     c   a      4  <NA>
#> 5     a   b      5  <NA>
#> 6     b   c      6  <NA>
#> 7     c   d      7  <NA>
#> 8     d   e      8     I
#> 9     e   f      9     Y
#> 10    f   g     10     N
#> 11    g   h     11  <NA>
#> 12    h   i     12  <NA>
#> 13    i   j     13  <NA>
#> 14    j   d     14  <NA>
#> 15    d   e     15     W
#> 16    e   f     16     Q
#> 17    f   g     17     Z
#> 18    g   h     18  <NA>
#> 19    h   i     19  <NA>
#> 20    i   j     20  <NA>
#> 21    j   d     21  <NA>
#> 22    d   e     22     V
#> 23    e   f     23     C
#> 24    f   g     24     C
#> 25    g   h     25  <NA>
#> 26    h   i     26  <NA>
#> 27    i   j     27  <NA>
#> 28    j 100     28  <NA>

Created on 2022-05-09 by the reprex package (v2.0.1)

CodePudding user response:

Using dplyr, we can create the column with replace - create a NA vector and replace with the sequence of count (sum of logical vector) where that condition is TRUE

library(dplyr)
df %>% 
   mutate(to_do = replace(rep(NA_real_, n()),
               prev %in% prev_condition & new %in% new_condition, 
     seq_len(sum(prev %in% prev_condition & new %in% new_condition))))

-output

  prev new change to_do
1     0   a      1    NA
2     a   b      2    NA
3     b   c      3    NA
4     c   a      4    NA
5     a   b      5    NA
6     b   c      6    NA
7     c   d      7    NA
8     d   e      8     1
9     e   f      9     2
10    f   g     10     3
11    g   h     11    NA
12    h   i     12    NA
13    i   j     13    NA
14    j   d     14    NA
15    d   e     15     4
16    e   f     16     5
17    f   g     17     6
18    g   h     18    NA
19    h   i     19    NA
20    i   j     20    NA
21    j   d     21    NA
22    d   e     22     7
23    e   f     23     8
24    f   g     24     9
25    g   h     25    NA
26    h   i     26    NA
27    i   j     27    NA
28    j 100     28    NA

CodePudding user response:

Will this work for you:

library(dplyr)

df %>% 
  mutate(to_do = ifelse(new %in% new_condition &
                          prev %in% prev_condition, rank(to_do), to_do),
         to_do = replace(to_do, !is.na(to_do), 
                         seq_len(sum(!is.na(to_do)))))
   prev new change to_do
1     0   a      1    NA
2     a   b      2    NA
3     b   c      3    NA
4     c   a      4    NA
5     a   b      5    NA
6     b   c      6    NA
7     c   d      7    NA
8     d   e      8     1
9     e   f      9     2
10    f   g     10     3
11    g   h     11    NA
12    h   i     12    NA
13    i   j     13    NA
14    j   d     14    NA
15    d   e     15     4
16    e   f     16     5
17    f   g     17     6
18    g   h     18    NA
19    h   i     19    NA
20    i   j     20    NA
21    j   d     21    NA
22    d   e     22     7
23    e   f     23     8
24    f   g     24     9
25    g   h     25    NA
26    h   i     26    NA
27    i   j     27    NA
28    j 100     28    NA
  • Related