Home > Software design >  Tidyverse solution for rowise sum of products over multiple columns
Tidyverse solution for rowise sum of products over multiple columns

Time:07-09

Problem

I want to find an elegant tidyverse solution to create a sum of m products of each n colums. I don't want to use positional matching and it should be generalizable.

I fiddled around with purrr::pmap_dbl(select(., ends_with(i)), prod) but did not get very far.

Example for m = 3 and n = 2

library(tidyverse)

df <- tibble(
  x_0 = c(5,6),
  x_1 = c(9,1),
  x_2 = c(2,1),
  y_0 = c(3,2),
  y_1 = c(3,2),
  y_2 = c(1,3)
)
df
> df
# A tibble: 2 × 6
# x_0   x_1   x_2   y_0   y_1   y_2
#<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#   5     9     2     3     3     1
#   6     1     1     2     2     3

I want to calculate the sum of the products rowise:
sum_of_products = x_0 * y_0 x_1 * y_1 x_2 y_2

First row: 5*3 9*3 2*2 = 46; Second row: 6*2 1*2 1*3 = 17

Desired output

df_with_sum_of_products
# x_0   x_1   x_2   y_0   y_1   y_2  sum_of_products
#<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>           <dbl>
#   5     9     2     3     3     1               46
#   6     1     1     2     2     3               17

CodePudding user response:

To obtain a completely generalizable and robust solution, I think it's best to transform the data frame to something more amenable to the task in hand.

df %>% 
  mutate(row=row_number()) %>% 
  pivot_longer(
    -row, 
    names_sep="_", 
    names_to=c("name", "index")
  ) %>%  
  group_by(row, index) %>% 
  pivot_wider(names_from=name, values_from=value)
# A tibble: 6 x 4
# Groups:   row, index [6]
    row index     x     y
  <int> <chr> <dbl> <dbl>
1     1 0         5     3
2     1 1         9     3
3     1 2         2     1
4     2 0         6     2
5     2 1         1     2
6     2 2         1     3

Then calculate the sum of products...

df %>% 
  mutate(row=row_number()) %>% 
  pivot_longer(
    -row, 
    names_sep="_", 
    names_to=c("name", "index")
  ) %>%  
  group_by(row, index) %>% 
  pivot_wider(names_from=name, values_from=value) %>% 
  mutate(product=x * y) %>% 
  group_by(row) %>% 
  summarise(sum_product=sum(product))
# A tibble: 2 x 2
    row sum_product
  <int>       <dbl>
1     1          44
2     2          17

This is robust to the number of rows, the number of variable types (eg x, y and z) and the number of indices (eg 1, 2 and 3).

Edit

My claim that the solution above is robust respect to number of variable types is false. (Because of the stage in the pipe that reads mutate(product=x * y).) Here's a solution that is, together with a modified input dataset to demonstrate that it is.

df1 <- tibble(
  x_0 = c(5,6,1,-1), x_1 = c(9,1,1,3), x_2 = c(2,1,3,4),
  y_0 = c(3,2,1, 2), y_1 = c(3,2,2,2), y_2 = c(1,3,2,2),
  z_0 = c(4,5,1, 3), z_1 = c(3,1,2,1), z_2 = c(2,2,1,3)

)

df1 %>% 
  mutate(row=row_number()) %>% 
  pivot_longer(
    -row, 
    names_sep="_", 
    names_to=c("name", "index")
  ) %>%  
  group_by(row, index) %>% 
  pivot_wider(names_from=name, values_from=value) %>% 
  group_map(
    function(.x, .y, .keep=TRUE) {
      .y %>% bind_cols(.x %>% mutate(product = unlist(apply(.x, 1, prod))))
    }
  ) %>% bind_rows() %>% 
  group_by(row) %>% 
  summarise(sum_product=sum(product))
# A tibble: 4 x 2
    row sum_product
  <int>       <dbl>
1     1         145
2     2          68
3     3          11
4     4          24

CodePudding user response:

We can use rowSums Reduce (but needs split.default to divide the data.frame into two, i.e., x_ and y_)

df %>%
  select(order(names(.))) %>%
  mutate(sum_of_prod = rowSums(
    Reduce(
      `*`,
      split.default(., gsub("_.*", "", names(.)))
    )
  ))

Or, we can use tcrossprod to compute sum of products

df %>%
  select(order(names(.))) %>%
  mutate(sum_of_prod = diag(
    do.call(
      tcrossprod,
      lapply(
        split.default(., gsub("_.*", "", names(.))),
        as.matrix
      )
    )
  ))

and we will obtain

# A tibble: 2 × 7
    x_0   x_1   x_2   y_0   y_1   y_2 sum_of_prod
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>       <dbl>
1     5     9     2     3     3     1          44
2     6     1     1     2     2     3          17

CodePudding user response:

A possible solution:

library(dplyr)

df %>% 
  mutate(sum_prod = rowSums(across(1:3)*across(4:6)))

#> # A tibble: 2 × 7
#>     x_0   x_1   x_2   y_0   y_1   y_2 sum_prod
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
#> 1     5     9     2     3     3     1       44
#> 2     6     1     1     2     2     3       17

Or more generalizable (the relocate instruction is only needed in cases where x_2 precedes x_1, for instance):

library(dplyr)

df %>%
  relocate(order(names(.))) %>% 
  mutate(sum_prod = rowSums(across(starts_with("x"))*across(starts_with("y"))))

#> # A tibble: 2 × 7
#>     x_0   x_1   x_2   y_0   y_1   y_2 sum_prod
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
#> 1     5     9     2     3     3     1       44
#> 2     6     1     1     2     2     3       17

CodePudding user response:

another option, for the fun of it:

library(magrittr) ## for the %>% pipe operator

df$sum_of_products <- 
df %>%
  apply(1, \(r){
    r %>%
      matrix(.,,2) %>%
      apply(., 1, prod) %>%
      sum
  })

This solution uses base R rowwise apply to halve each dataframe row into a two-column matrix which is then (another apply) reduced to rowwise column products and finally summed.

CodePudding user response:

For the fun we could also construct a formula and use rowwise():

library(tidyverse)

m <- unique(str_remove(names(df), ".*_"))
n <- unique(str_remove(names(df), "_.*"))
formula <- "0"
for (each_m in m) formula <- paste0(formula, "   ", paste0(paste0(n, "_", each_m), collapse = " * "))

df |>
  rowwise() |>
  mutate(sum = eval(parse_expr(formula)))
  ungroup()

Output:

# A tibble: 2 × 7
    x_0   x_1   x_2   y_0   y_1   y_2 sum_of_products
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>           <dbl>
1     5     9     2     3     3     1              44
2     6     1     1     2     2     3              17

Update: General (unelegant) solution.

  • Related