I'm trying to create new columns in a data frame from the products of multiplying one selection of columns with another. E.g:
df <- as.data.frame(matrix(rep(1:6, 3), nrow = 3,
dimnames = list(NULL, letters[1:6])))
df
A data.frame: 3 × 6
a b c d e f
1 4 1 4 1 4
2 5 2 5 2 5
3 6 3 6 3 6
df <- df %>% mutate(df$a:df$c * df$d:df$f)
df
A data.frame: 3 × 6
a b c d e f a*d a*e a*f b*d b*e b*f c*d c*e c*f
1 4 1 4 1 4 4 1 4 16 4 16 4 4 16
2 5 2 5 2 5 10 4 10 25 10 25 10 4 10
3 6 3 6 3 6 12 9 12 36 18 36 18 9 18
I want to find an easy general way to create product columns and add to the dataset.
In the example I try to multiply columns a
, b
and c
with columns d
, e
and f
and add all possible combinations to the dataframe. The syntax above obviously doesn't work, so I want to find the easiest solution to accomplish this.
CodePudding user response:
While not simple, this is easy enough.
# the data
df <- as.data.frame(matrix(rep(1:6, 3), nrow = 3,
dimnames = list(NULL, letters[1:6])))
library(dplyr)
library(rlang)
# set the column groups you want to multiply
cols1 <- c("a", "b", "c")
cols2 <- c("d", "e", "f")
# create the multiplication expressions.
col_mult <- set_names(c(outer(cols1, cols2, paste, sep = "*")))
col_expr <- parse_exprs(col_mult)
# use the !!! operator to execute them all
df %>%
mutate(!!!col_expr)
Which gives the following:
a b c d e f a*d b*d c*d a*e b*e c*e a*f b*f c*f
1 1 4 1 4 1 4 4 16 4 1 4 1 4 16 4
2 2 5 2 5 2 5 10 25 10 4 10 4 10 25 10
3 3 6 3 6 3 6 18 36 18 9 18 9 18 36 18
If you were doing this a lot in complex cases, you could go all out with it and make a function so that you can use the tidyselect helpers. Again, not the most simple thing, but it would fit the bill of a "general way".
mutate_product <- function(df, cols_x, cols_y) {
.cols_x <- names(tidyselect::eval_select(enexpr(cols_x), df))
.cols_y <- names(tidyselect::eval_select(enexpr(cols_y), df))
col_mult <- set_names(c(outer(.cols_x, .cols_y, paste, sep = "*")))
col_expr <- parse_exprs(col_mult)
mutate(df, !!!col_expr)
}
df %>%
mutate_product(a:c, starts_with("d"))
# a b c d e f a*d b*d c*d
# 1 1 4 1 4 1 4 4 16 4
# 2 2 5 2 5 2 5 10 25 10
# 3 3 6 3 6 3 6 18 36 18
CodePudding user response:
You could do this in base R with:
cbind(df, setNames(outer(1:3, 4:6, function(x, y) df[x] * df[y]),
as.vector(outer(letters[1:3], letters[4:6], paste, sep = " * "))))
#> a b c d e f a * d b * d c * d a * e b * e c * e a * f b * f c * f
#> 1 1 4 1 4 1 4 4 16 4 1 4 1 4 16 4
#> 2 2 5 2 5 2 5 10 25 10 4 10 4 10 25 10
#> 3 3 6 3 6 3 6 18 36 18 9 18 9 18 36 18
Data
df <- setNames(as.data.frame(matrix(1:6, ncol = 6, nrow = 3)), letters[1:6])
df
#> a b c d e f
#> 1 1 4 1 4 1 4
#> 2 2 5 2 5 2 5
#> 3 3 6 3 6 3 6
CodePudding user response:
Let's try
p <- data.frame(
lapply(
df[4:6],
function(x) x * df[1:3]
)
)
cbind(
df,
setNames(p, gsub("(\\w )\\.(\\w )", "\\2*\\1", names(p)))
)
which gives
a b c d e f a*d b*d c*d a*e b*e c*e a*f b*f c*f
1 1 4 1 4 1 4 4 16 4 1 4 1 4 16 4
2 2 5 2 5 2 5 10 25 10 4 10 4 10 25 10
3 3 6 3 6 3 6 18 36 18 9 18 9 18 36 18