I have a data frame that looks like that:
rrr<-data.frame(a=c(1,2,3), b=c(3,4,5), co=c('a','b','a'))
a b co
1 1 3 a
2 2 4 b
3 3 5 a
What's the way to create a new column filled by the value from a corresponding column based on the value of co
? So if co == 'a' then newColumn should get value from 'a' column.
CodePudding user response:
Using dplyr
library(dplyr)
rrr %>%
rowwise %>%
mutate(newColumn = cur_data()[[co]]) %>%
ungroup
# A tibble: 3 × 4
a b co newColumn
<dbl> <dbl> <chr> <dbl>
1 1 3 a 1
2 2 4 b 4
3 3 5 a 3
CodePudding user response:
Another dplyr
option:
library(dplyr)
rrr %>%
mutate(across(c(a,b), ~case_when(co == cur_column() ~ .), .names = 'new_{col}'),
new_colum = coalesce(new_a, new_b), .keep="unused")
new_a new_b new_colum
1 1 NA 1
2 NA 4 4
3 3 NA 3
CodePudding user response:
base R
In general, [
-indexing can use a matrix for indexing, with as many columns as the original object has dimensions. While it seems odd to say it this way (since data.frame
always has 2 dimensions), one can safely infer from this that it implicitly does matrix
-operations on the object when doing this. It doesn't convert the original object, but it does internal casting that will result in character
here. For instance,
rrr[cbind(seq_len(nrow(rrr)), match(rrr$co, colnames(rrr)))]
# [1] "1" "4" "3"
Even though columns a
and b
are both class numeric
, the result is cast to character
because the internal [
-indexing with i=matrix(..)
is internally converting rrr
to a matrix
, which up-classes all columns to character
(because of the co
column).
We can work around this by subsetting:
cols <- c("a", "b")
rrr[,cols][cbind(seq_len(nrow(rrr)), match(rrr$co, cols))]
# [1] 1 4 3
(And assignment with rrr$newColumn <- ...
for either of those.)
dplyr #1
We can adapt the above.
Note that [
-matrix-indexing does not work on tibble
s. There are a couple of workaround, neither of which seem "awesome" in my book:
Use
rrr
in the pipe. This works only so long as the original frame is justdata.frame
and not atibble
.While we can shift to the more canonical
cur_data()
inside of themutate
call, we must wrap it to declass it a little.
To be safe, I'll use the second option, even though it makes the code a little less-awesome-looking.
library(dplyr)
rrr %>%
mutate(newColumn = as.numeric(as.data.frame(cur_data())[cbind(row_number(), match(co, names(rrr)))]))
# a b co newColumn
# 1 1 3 a 1
# 2 2 4 b 4
# 3 3 5 a 3
dplyr #2
We can generalize TarJae's suggestion a bit with
library(dplyr)
rrr %>%
mutate(newColumn = apply(across(a:b, ~ case_when(co == cur_column() ~ .)),
1, function(z) na.omit(z)[1]))
# a b co newColumn
# 1 1 3 a 1
# 2 2 4 b 4
# 3 3 5 a 3
A notable side-effect of this is that it preserves the desired numeric
class of the columns (as can be seen with str
or such).
(While a code-golf approach might reduce it from apply(.., 1, function(z) na.omit(z)[1])
to apply(.., 1, na.omit)
, the latter can fail: if co
includes something not found in the other columns, then the call to na.omit
will return a length-0 vector, which will not work. By using na.omit(z)[1]
, the [1]
will force it to NA
in that case.)
CodePudding user response:
A possible solution in base R
:
rrr$newColumn <- apply(rrr, 1, \(x) as.numeric(x[x["co"]]))
rrr
#> a b co newColumn
#> 1 1 3 a 1
#> 2 2 4 b 4
#> 3 3 5 a 3