Home > front end >  Multiply values in a data frame (based on a specific grouping) by a separate matrix using dplyr
Multiply values in a data frame (based on a specific grouping) by a separate matrix using dplyr

Time:11-10

I need to multiply values in a data frame (based on a specific grouping) by a separate matrix that puts some kind of weights on those values. The multiplication is part of a function that I wrote. I know how to do this in the most basic way. But I cannot understand how I can do it in a more realistic setting. I hope my example makes this problem clear.

I have the following example dataset:

set.seed(45)
tibble(site = rep(c(LETTERS[1:3]), each = 6),
       name = rep(c(letters[10:15]), 3),
       size = runif(18)) %>%
  arrange(site, name) -> d_tibble

I also have a matrix that could represent some kind of weights:

d_matrix <- matrix(0, 6, 6)
diag(d_matrix) <- 1
rownames(d_matrix) <- letters[10:15]
colnames(d_matrix) <- letters[10:15]

d_matrix
##   j k l m n o
## j 1 0 0 0 0 0
## k 0 1 0 0 0 0
## l 0 0 1 0 0 0
## m 0 0 0 1 0 0
## n 0 0 0 0 1 0
## o 0 0 0 0 0 1

I also have a function that is supposed to multiply the vector p by the matrix b

test_fct <- function(a, b) {
  p <- a / sum(a)
  sum(p * (p %*% b))
}

Then I want to do something like this, i.e. using my function in summarise():

#d_tibble %>%
#  group_by(site) %>%
#  summarise(y = test_fct(size, b))

But I don't know how to get b,i.e. the matrix, into my custom function so that its column names match the name variable when grouped by site.

One way I tried was to merge the matrix onto the data frame - that way I have everything in one data frame:

d_tibble %>%
  left_join(d_matrix %>%
              as_tibble() %>%
              mutate(name = colnames(d_matrix))) -> tibble_matrix_join

Than I have it all together but I need to somehow access the unique values of the name variable given the site grouping in order to select the correct columns (j, k, l, m, n, o) for the vector/matrix multiplication in my function test_fct():

#tibble_matrix_join %>%
#  group_by(site) %>%
#  summarise(result = test_fct(size, b))

I tried to check if the general set-up works, that is for only one site and including all names in the matrix, and it does:

d_tibble %>% 
    filter(site == "A") %>% 
    pull(size) -> my_x 

test_fct(my_x, d_matrix)
## [1] 0.1858158

my_p <- my_x/sum(my_x)
sum(my_p * (my_p %*% d_matrix))
## [1] 0.1858158

CodePudding user response:

With the example, all the columns in the d_matrix is found in the 'name' column of the tibble for all the 'site's. If it is not the case, we may do

library(dplyr)
d_tibble %>%
   group_by(site) %>% 
   summarise(out = test_fct(size, d_matrix[intersect(row.names(d_matrix), 
         name), intersect(colnames(d_matrix), 
         name), drop = FALSE]), .groups = "drop")

-output

# A tibble: 3 × 2
  site    out
  <chr> <dbl>
1 A     0.186
2 B     0.264
3 C     0.218

-testing for a smaller data

d_tibble %>% 
  slice_sample(n = 12) %>%
  arrange(site, name) %>% 
  group_by(site) %>% 
   summarise(out = test_fct(size, d_matrix[intersect(row.names(d_matrix), 
         name), intersect(colnames(d_matrix), 
         name), drop = FALSE]), .groups = "drop")

-output

# A tibble: 3 × 2
  site    out
  <chr> <dbl>
1 A     0.227
2 B     0.416
3 C     0.481
  • Related