Problem
I want to find an elegant tidyverse solution to create a sum of m products of each n colums. I don't want to use positional matching and it should be generalizable.
I fiddled around with purrr::pmap_dbl(select(., ends_with(i)), prod)
but did not get very far.
Example for m = 3 and n = 2
library(tidyverse)
df <- tibble(
x_0 = c(5,6),
x_1 = c(9,1),
x_2 = c(2,1),
y_0 = c(3,2),
y_1 = c(3,2),
y_2 = c(1,3)
)
df
> df
# A tibble: 2 × 6
# x_0 x_1 x_2 y_0 y_1 y_2
#<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 5 9 2 3 3 1
# 6 1 1 2 2 3
I want to calculate the sum of the products rowise:
sum_of_products = x_0 * y_0 x_1 * y_1 x_2 y_2
First row: 5*3 9*3 2*2 = 46
; Second row: 6*2 1*2 1*3 = 17
Desired output
df_with_sum_of_products
# x_0 x_1 x_2 y_0 y_1 y_2 sum_of_products
#<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 5 9 2 3 3 1 46
# 6 1 1 2 2 3 17
CodePudding user response:
To obtain a completely generalizable and robust solution, I think it's best to transform the data frame to something more amenable to the task in hand.
df %>%
mutate(row=row_number()) %>%
pivot_longer(
-row,
names_sep="_",
names_to=c("name", "index")
) %>%
group_by(row, index) %>%
pivot_wider(names_from=name, values_from=value)
# A tibble: 6 x 4
# Groups: row, index [6]
row index x y
<int> <chr> <dbl> <dbl>
1 1 0 5 3
2 1 1 9 3
3 1 2 2 1
4 2 0 6 2
5 2 1 1 2
6 2 2 1 3
Then calculate the sum of products...
df %>%
mutate(row=row_number()) %>%
pivot_longer(
-row,
names_sep="_",
names_to=c("name", "index")
) %>%
group_by(row, index) %>%
pivot_wider(names_from=name, values_from=value) %>%
mutate(product=x * y) %>%
group_by(row) %>%
summarise(sum_product=sum(product))
# A tibble: 2 x 2
row sum_product
<int> <dbl>
1 1 44
2 2 17
This is robust to the number of rows, the number of variable types (eg x
, y
and z
) and the number of indices (eg 1
, 2
and 3
).
Edit
My claim that the solution above is robust respect to number of variable types is false. (Because of the stage in the pipe that reads mutate(product=x * y)
.) Here's a solution that is, together with a modified input dataset to demonstrate that it is.
df1 <- tibble(
x_0 = c(5,6,1,-1), x_1 = c(9,1,1,3), x_2 = c(2,1,3,4),
y_0 = c(3,2,1, 2), y_1 = c(3,2,2,2), y_2 = c(1,3,2,2),
z_0 = c(4,5,1, 3), z_1 = c(3,1,2,1), z_2 = c(2,2,1,3)
)
df1 %>%
mutate(row=row_number()) %>%
pivot_longer(
-row,
names_sep="_",
names_to=c("name", "index")
) %>%
group_by(row, index) %>%
pivot_wider(names_from=name, values_from=value) %>%
group_map(
function(.x, .y, .keep=TRUE) {
.y %>% bind_cols(.x %>% mutate(product = unlist(apply(.x, 1, prod))))
}
) %>% bind_rows() %>%
group_by(row) %>%
summarise(sum_product=sum(product))
# A tibble: 4 x 2
row sum_product
<int> <dbl>
1 1 145
2 2 68
3 3 11
4 4 24
CodePudding user response:
We can use rowSums
Reduce
(but needs split.default
to divide the data.frame into two, i.e., x_
and y_
)
df %>%
select(order(names(.))) %>%
mutate(sum_of_prod = rowSums(
Reduce(
`*`,
split.default(., gsub("_.*", "", names(.)))
)
))
Or, we can use tcrossprod
to compute sum of products
df %>%
select(order(names(.))) %>%
mutate(sum_of_prod = diag(
do.call(
tcrossprod,
lapply(
split.default(., gsub("_.*", "", names(.))),
as.matrix
)
)
))
and we will obtain
# A tibble: 2 × 7
x_0 x_1 x_2 y_0 y_1 y_2 sum_of_prod
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 5 9 2 3 3 1 44
2 6 1 1 2 2 3 17
CodePudding user response:
A possible solution:
library(dplyr)
df %>%
mutate(sum_prod = rowSums(across(1:3)*across(4:6)))
#> # A tibble: 2 × 7
#> x_0 x_1 x_2 y_0 y_1 y_2 sum_prod
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 5 9 2 3 3 1 44
#> 2 6 1 1 2 2 3 17
Or more generalizable (the relocate
instruction is only needed in cases where x_2
precedes x_1
, for instance):
library(dplyr)
df %>%
relocate(order(names(.))) %>%
mutate(sum_prod = rowSums(across(starts_with("x"))*across(starts_with("y"))))
#> # A tibble: 2 × 7
#> x_0 x_1 x_2 y_0 y_1 y_2 sum_prod
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 5 9 2 3 3 1 44
#> 2 6 1 1 2 2 3 17
CodePudding user response:
another option, for the fun of it:
library(magrittr) ## for the %>% pipe operator
df$sum_of_products <-
df %>%
apply(1, \(r){
r %>%
matrix(.,,2) %>%
apply(., 1, prod) %>%
sum
})
This solution uses base R rowwise apply
to halve each dataframe row into a two-column matrix which is then (another apply
) reduced to rowwise column products and finally summed.
CodePudding user response:
For the fun we could also construct a formula and use rowwise()
:
library(tidyverse)
m <- unique(str_remove(names(df), ".*_"))
n <- unique(str_remove(names(df), "_.*"))
formula <- "0"
for (each_m in m) formula <- paste0(formula, " ", paste0(paste0(n, "_", each_m), collapse = " * "))
df |>
rowwise() |>
mutate(sum = eval(parse_expr(formula)))
ungroup()
Output:
# A tibble: 2 × 7
x_0 x_1 x_2 y_0 y_1 y_2 sum_of_products
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 5 9 2 3 3 1 44
2 6 1 1 2 2 3 17
Update: General (unelegant) solution.