Tidy Way to Multiply One Set of Columns Against Another Set-CodePudding

I'm working on a survey research project in which we need to multiply one group of columns against another group of columns. I can do this easily in base R, but I can't figure out how to do it within a tidy, pipe-based workflow. I found many solutions for multiplying a set of columns against one column, but not for multiple columns. Any help is greatly appreciated.

The example below demonstrates how I solve the problem in base R.

library(tidyverse)

df <- tibble(
  a1 = c(1, 2, 3, 4, 5),
  a2 = c(5, 4, 3, 2, 1),
  a3 = c(1, 5, 2, 4, 3),
  b1 = c(1, 1, 2, 1, 1),
  b2 = c(3, 3, 5, 4, 1),
  b3 = c(2, 1, 4, 2, 1)
)

new_df <- df[, c("a1", "a2", "a3")] * df[, c("b1", "b2", "b3")]
names(new_df) <- c("c1", "c2", "c3")

^{Created on 2022-06-14 by the reprex package (v2.0.1)}

CodePudding user response：

Not everything needs to be tidy. The base R solution that you have works perfectly fine, is neat and scalable.

The same can be achieved with tidyverse but it would not be neat. Here's a way with across.

library(dplyr)

df %>%
  mutate(across(starts_with('a'), .names = '{sub("a", "c", col)}') * 
         across(starts_with('b'))) %>%
  select(starts_with('c'))

#     c1    c2    c3
#  <dbl> <dbl> <dbl>
#1     1    15     2
#2     2    12     5
#3     6    15     8
#4     4     8     8
#5     5     1     3

The base R code can also be written as -

df %>% select(starts_with('a')) * df %>% select(starts_with('b'))

CodePudding user response：

You could stack the two groups of columns pairwisely, multiply them together, and then pivot the long data to wide.

library(tidyverse)

df %>%
  rowid_to_column("id") %>% 
  pivot_longer(-id, names_to = c(".value", "set"), names_pattern = "(.)(.)") %>%
  mutate(c = a * b) %>%
  pivot_wider(id, names_from = set, values_from = c, names_prefix = "c")

# # A tibble: 5 × 4
#      id    c1    c2    c3
#   <int> <dbl> <dbl> <dbl>
# 1     1     1    15     2
# 2     2     2    12     5
# 3     3     6    15     8
# 4     4     4     8     8
# 5     5     5     1     3

CodePudding user response：

Here is a mostly tidyverse option, except for using split.default. We can split into a list of dataframes based on the ending, then we can use reduce to perform the calculation for each dataframe, then return as 1 single dataframe (and finally add the c prefix to the column names).

library(tidyverse)

df %>%
  split.default(., sub("\\D ", "", names(.))) %>% 
  map_df(., ~ reduce(.x, .f = `*`)) %>% 
  rename_with( ~ paste0("c", .x))

Output

     c1    c2    c3
  <dbl> <dbl> <dbl>
1     1    15     2
2     2    12     5
3     6    15     8
4     4     8     8
5     5     1     3

CodePudding user response：

I agree with @Ronak Shah that sometimes it's easier with base R functions but here is the pipe solution you are asking:

df <- tibble(
  a1 = c(1, 2, 3, 4, 5),
  a2 = c(5, 4, 3, 2, 1),
  a3 = c(1, 5, 2, 4, 3),
  b1 = c(1, 1, 2, 1, 1),
  b2 = c(3, 3, 5, 4, 1),
  b3 = c(2, 1, 4, 2, 1)
)
data.frame(df) %>%
  mutate(c = select(cur_data(), a1:a3)*select(cur_data(), b1:b3)) %>% 
  invoke(.f = data.frame) %>%
  rename_with(~str_remove(.,".a"))

The outputs looks like this:

  a1 a2 a3 b1 b2 b3 c1 c2 c3
1  1  5  1  1  3  2  1 15  2
2  2  4  5  1  3  1  2 12  5
3  3  3  2  2  5  4  6 15  8
4  4  2  4  1  4  2  4  8  8
5  5  1  3  1  1  1  5  1  3