Home > other >  How to create a Weighted Sum Score based on a second dataset for specific variables
How to create a Weighted Sum Score based on a second dataset for specific variables

Time:01-26

I have to create a weighted sum score (WSum) based on several variables. For instance, mydata has three variables (A, B, and C). I have to create a 'WSum' score in which I multiple each of these variables by the weights present in the weights dataset, and sum all of them (A*0.5 B*1.2 C*2). I have to do all of that without altering other variables in the dataset, for instance, in this case, the variable ID.

PS: I can definitely do it manually mutate(Wsum = (A*0.5 B*1.2 C*2)). But I would like to automatize this process, because, as you might guess, I don't have only 3 variables, and that will not be a one-time job. Thanks in advance.

Example data

Packages required

library(dplyr)

Example Data

mydata <- tibble(ID = (01, 02
                 A = c(1, 2, 3),
                 B = c(2, 5, 4),
                 C = c(3, 6, 1))

>    # A tibble: 3 x 4
>     ID     A     B     C
>  <dbl> <dbl> <dbl> <dbl>
>1     1     1     2     3
>2     2     2     5     6
>3     3     3     4     1

Data Weight Dictionary

weights <- tibble( variables = c("A", "B", "C"),
                   w = c(0.5, 1.2, 2))

># A tibble: 3 x 2
>  variables     w
>  <chr>     <dbl>
>1 A           0.5
>2 B           1.2
>3 C           2 

Expected result

># A tibble: 3 x 5
>     ID     A     B     C  Wsum
>  <dbl> <dbl> <dbl> <dbl> <dbl>
>1     1     1     2     3   8.9
>2     2     2     5     6  19  
>3     3     3     4     1   8.3

CodePudding user response:

In base R, use %*% or crossprod after converting to matrix

mydata$Wsum <- as.vector(as.matrix(mydata[weights$variables]) %*% 
       with(weights, setNames(w, variables)))

-output

mydata$Wsum
[1]  8.9 19.0  8.3

Or using tidyverse, loop across the columns in the 'mydata', get the corresponding value from deframed (converted to named vector) 'weights', multiply and get the rowSums to create the 'Wsum' column

library(dplyr)
library(tibble)
mydata <- mydata %>%
  mutate(Wsum = rowSums(across(all_of(weights$variables),
    ~ .x * deframe(weights)[[cur_column()]])))
mydata
# A tibble: 3 × 4
      A     B     C  Wsum
  <dbl> <dbl> <dbl> <dbl>
1     1     2     3   8.9
2     2     5     6  19  
3     3     4     1   8.3
  •  Tags:  
  • Related