Home > Net >  How to create a standardised mean for 2 groups of numerous variables in r?
How to create a standardised mean for 2 groups of numerous variables in r?

Time:08-05

I am playing around with the brca dataset in r and am trying to create 2 x standardised mean values for each variable; one for the B group and one for the M group. This is so that I can calculate the difference between the standardised mean to see which variables have the highest difference.

I think what I want to do is:

  1. scale each variable so they are standardised
  2. group by the outcome (either B or M)
  3. calculate the mean of each variable for each group
  4. pivot from wide to long
  5. I expect that B is one column and M is a second column at this point (and each variable mean is a row, with variable name being row name)
  6. calculate the absolute difference between means for B & M for each variable and store as new column
  7. arrange by desc

Does my logic sound correct? If so, 'think' I have managed to do steps 1-3 but I have never done these calculations before let alone done them in r so I have no idea if I am on the right track. Would anyone mind reviewing and seeing if it looks right?

Secondly - can someone help me with how to complete the pivot to a long table (my step 4)?

library(tidyverse)
library(purrrlyr)
library(ggplot2)


temp <- dslabs::brca
df <- cbind(as.data.frame(temp$x), outcome = temp$y) 


scaled_df <- df %>%
  mutate_if(is.numeric, scale) %>%
  group_by(outcome) %>%
  dmap(mean)

CodePudding user response:

Something like this?

suppressPackageStartupMessages({
  library(tidyverse)
  library(purrrlyr)
})

temp <- dslabs::brca
df <- cbind(as.data.frame(temp$x), outcome = temp$y)

scaled_df <- df %>%
  mutate_if(is.numeric, scale) %>%
  group_by(outcome) %>%
  purrrlyr::dmap(mean)

scaled_df %>%
  pivot_longer(-outcome) %>%
  group_by(name) %>%
  summarise(diff_means = diff(value))
#> # A tibble: 30 × 2
#>    name              diff_means
#>    <chr>                  <dbl>
#>  1 area_mean              1.47 
#>  2 area_se                1.13 
#>  3 area_worst             1.52 
#>  4 compactness_mean       1.23 
#>  5 compactness_se         0.605
#>  6 compactness_worst      1.22 
#>  7 concave_pts_mean       1.60 
#>  8 concave_pts_se         0.843
#>  9 concave_pts_worst      1.64 
#> 10 concavity_mean         1.44 
#> # … with 20 more rows

Created on 2022-08-04 by the reprex package (v2.0.1)

  • Related