Home > database >  get a value from column based on string in another column in data frame
get a value from column based on string in another column in data frame

Time:06-03

I have a data frame that looks like that:

rrr<-data.frame(a=c(1,2,3), b=c(3,4,5), co=c('a','b','a'))
  a b co
1 1 3  a
2 2 4  b
3 3 5  a

What's the way to create a new column filled by the value from a corresponding column based on the value of co? So if co == 'a' then newColumn should get value from 'a' column.

CodePudding user response:

Using dplyr

library(dplyr)
rrr %>%
   rowwise %>%
   mutate(newColumn = cur_data()[[co]]) %>%
   ungroup
# A tibble: 3 × 4
      a     b co    newColumn
  <dbl> <dbl> <chr>     <dbl>
1     1     3 a             1
2     2     4 b             4
3     3     5 a             3

CodePudding user response:

Another dplyr option:

library(dplyr)

rrr %>% 
  mutate(across(c(a,b), ~case_when(co == cur_column() ~ .), .names = 'new_{col}'),
         new_colum = coalesce(new_a, new_b), .keep="unused")
  new_a new_b new_colum
1     1    NA         1
2    NA     4         4
3     3    NA         3

CodePudding user response:

base R

In general, [-indexing can use a matrix for indexing, with as many columns as the original object has dimensions. While it seems odd to say it this way (since data.frame always has 2 dimensions), one can safely infer from this that it implicitly does matrix-operations on the object when doing this. It doesn't convert the original object, but it does internal casting that will result in character here. For instance,

rrr[cbind(seq_len(nrow(rrr)), match(rrr$co, colnames(rrr)))]
# [1] "1" "4" "3"

Even though columns a and b are both class numeric, the result is cast to character because the internal [-indexing with i=matrix(..) is internally converting rrr to a matrix, which up-classes all columns to character (because of the co column).

We can work around this by subsetting:

cols <- c("a", "b")
rrr[,cols][cbind(seq_len(nrow(rrr)), match(rrr$co, cols))]
# [1] 1 4 3

(And assignment with rrr$newColumn <- ... for either of those.)

dplyr #1

We can adapt the above.

Note that [-matrix-indexing does not work on tibbles. There are a couple of workaround, neither of which seem "awesome" in my book:

  1. Use rrr in the pipe. This works only so long as the original frame is just data.frame and not a tibble.

  2. While we can shift to the more canonical cur_data() inside of the mutate call, we must wrap it to declass it a little.

To be safe, I'll use the second option, even though it makes the code a little less-awesome-looking.

library(dplyr)
rrr %>%
  mutate(newColumn = as.numeric(as.data.frame(cur_data())[cbind(row_number(), match(co, names(rrr)))]))
#   a b co newColumn
# 1 1 3  a         1
# 2 2 4  b         4
# 3 3 5  a         3

dplyr #2

We can generalize TarJae's suggestion a bit with

library(dplyr)
rrr %>%
  mutate(newColumn = apply(across(a:b, ~ case_when(co == cur_column() ~ .)),
                           1, function(z) na.omit(z)[1]))
#   a b co newColumn
# 1 1 3  a         1
# 2 2 4  b         4
# 3 3 5  a         3

A notable side-effect of this is that it preserves the desired numeric class of the columns (as can be seen with str or such).

(While a code-golf approach might reduce it from apply(.., 1, function(z) na.omit(z)[1]) to apply(.., 1, na.omit), the latter can fail: if co includes something not found in the other columns, then the call to na.omit will return a length-0 vector, which will not work. By using na.omit(z)[1], the [1] will force it to NA in that case.)

CodePudding user response:

A possible solution in base R:

rrr$newColumn <- apply(rrr, 1, \(x) as.numeric(x[x["co"]]))
rrr

#>   a b co newColumn
#> 1 1 3  a         1
#> 2 2 4  b         4
#> 3 3 5  a         3
  • Related