Home > front end >  Create a new column a fill with values from a set of multiple columns conditional on column names
Create a new column a fill with values from a set of multiple columns conditional on column names

Time:12-28

I have this dataframe:

df <- df <- structure(list(A01 = c(0L, 0L, 2L, 0L, 4L, 1L, 10L, 10L), A02 = c(0L, 
-1L, 0L, 1L, 4L, 4L, 9L, 12L), A03 = c(-5L, -4L, 2L, -4L, 3L, 
2L, 8L, 12L), A04 = c(-1L, -3L, 3L, 1L, 3L, -3L, 9L, 12L), A05 = c(-1L, 
-3L, 1L, -1L, 3L, 0L, 7L, 10L), A06 = c(2L, -3L, 3L, 1L, 4L, 
0L, 7L, 12L), A07 = c(3L, -3L, 3L, 1L, 4L, 0L, 7L, 9L), X = c(2L, 
2L, 6L, 7L, 12L, 15L, 22L, 24L)), class = "data.frame", row.names = c(NA, 
-8L))

  A01 A02 A03 A04 A05 A06 A07  X
1   0   0  -5  -1  -1   2   3  2
2   0  -1  -4  -3  -3  -3  -3  2
3   2   0   2   3   1   3   3  6
4   0   1  -4   1  -1   1   1  7
5   4   4   3   3   3   4   4 12
6   1   4   2  -3   0   0   0 15
7  10   9   8   9   7   7   7 22
8  10  12  12  12  10  12   9 24

I would like to apply this code: from a former question of mine Mutate a new column and paste value from existing columns conditional on string in column names

library(dplyr)
library(stringr)
df %>% 
  rowwise %>%
  mutate(new_col = get(str_c('A0', X))) %>%
  ungroup

I get the error:

Error: Problem with `mutate()` column `new_col`.
i `new_col = get(str_c("A0", X))`.
x object 'A012' not found
i The error occurred in row 5.
Run `rlang::last_error()` to see where the error occurred.

And I know the reason: The reason is that the code try to get a A012 column because of row 5 in X column = 12. But there is no column A012.

Desired_output:

   A01 A02 A03 A04 A05 A06 A07  X new_col
1   0   0  -5  -1  -1   2   3  2    0
2   0  -1  -4  -3  -3  -3  -3  2    -1
3   2   0   2   3   1   3   3  6    3
4   0   1  -4   1  -1   1   1  7    1
5   4   4   3   3   3   4   4 12    NA
6   1   4   2  -3   0   0   0 15    NA
7  10   9   8   9   7   7   7 22    NA
8  10  12  12  12  10  12   9 24    NA

CodePudding user response:

One option to achieve your desired result would be via an if condition:

library(dplyr)
library(stringr)
df %>% 
  rowwise() %>%
  mutate(new_col = if (str_c('A0', X) %in% names(.)) get(str_c('A0', X)) else NA) %>%
  ungroup()
#> # A tibble: 8 × 9
#>     A01   A02   A03   A04   A05   A06   A07     X new_col
#>   <int> <int> <int> <int> <int> <int> <int> <int>   <int>
#> 1     0     0    -5    -1    -1     2     3     2       0
#> 2     0    -1    -4    -3    -3    -3    -3     2      -1
#> 3     2     0     2     3     1     3     3     6       3
#> 4     0     1    -4     1    -1     1     1     7       1
#> 5     4     4     3     3     3     4     4    12      NA
#> 6     1     4     2    -3     0     0     0    15      NA
#> 7    10     9     8     9     7     7     7    22      NA
#> 8    10    12    12    12    10    12     9    24      NA

CodePudding user response:

You can try the following base R code

transform(
  df,
  new_col = df[cbind(seq_along(X), match(paste0("A0", X), names(df)))]
)

which gives

  A01 A02 A03 A04 A05 A06 A07  X new_col
1   0   0  -5  -1  -1   2   3  2       0
2   0  -1  -4  -3  -3  -3  -3  2      -1
3   2   0   2   3   1   3   3  6       3
4   0   1  -4   1  -1   1   1  7       1
5   4   4   3   3   3   4   4 12      NA
6   1   4   2  -3   0   0   0 15      NA
7  10   9   8   9   7   7   7 22      NA
8  10  12  12  12  10  12   9 24      NA

The dplyr version might look like this (sorry that I am not a dplyr master, and there must be a more elegant representation than this)

> df %>%
    mutate(new_col = .[cbind(seq_along(X), match(paste0("A0", X), names(.)))])
  A01 A02 A03 A04 A05 A06 A07  X new_col
1   0   0  -5  -1  -1   2   3  2       0
2   0  -1  -4  -3  -3  -3  -3  2      -1
3   2   0   2   3   1   3   3  6       3
4   0   1  -4   1  -1   1   1  7       1
5   4   4   3   3   3   4   4 12      NA
6   1   4   2  -3   0   0   0 15      NA
7  10   9   8   9   7   7   7 22      NA
8  10  12  12  12  10  12   9 24      NA

CodePudding user response:

You can use purrr::imap_dbl to iterate over the vector df$X and its index, create the new column name, extract the value of that column at the given index, and return a double. In the event that there's no match, NULL will be replaced by the double-type NA. The infix operator %||% is from rlang, exported by purrr; the docs say it's based on the OR operator in Ruby, although I'm familiar with a similar thing in Javascript.

library(purrr)

df$new_col <- imap_dbl(df$X, 
                       function(val, i) df[i, sprintf("Ad", val)] %||% NA_real_)
df
#>   A01 A02 A03 A04 A05 A06 A07  X new_col
#> 1   0   0  -5  -1  -1   2   3  2       0
#> 2   0  -1  -4  -3  -3  -3  -3  2      -1
#> 3   2   0   2   3   1   3   3  6       3
#> 4   0   1  -4   1  -1   1   1  7       1
#> 5   4   4   3   3   3   4   4 12      NA
#> 6   1   4   2  -3   0   0   0 15      NA
#> 7  10   9   8   9   7   7   7 22      NA
#> 8  10  12  12  12  10  12   9 24      NA

I'm making the assumption that the numbers in column names are zero-padded to a width of 2, but if it actually is correct to just paste "A0" to each number, do that instead.

  • Related