I am playing around with the brca dataset in r and am trying to create 2 x standardised mean values for each variable; one for the B group and one for the M group. This is so that I can calculate the difference between the standardised mean to see which variables have the highest difference.
I think what I want to do is:
- scale each variable so they are standardised
- group by the outcome (either B or M)
- calculate the mean of each variable for each group
- pivot from wide to long
- I expect that B is one column and M is a second column at this point (and each variable mean is a row, with variable name being row name)
- calculate the absolute difference between means for B & M for each variable and store as new column
- arrange by desc
Does my logic sound correct? If so, 'think' I have managed to do steps 1-3 but I have never done these calculations before let alone done them in r so I have no idea if I am on the right track. Would anyone mind reviewing and seeing if it looks right?
Secondly - can someone help me with how to complete the pivot to a long table (my step 4)?
library(tidyverse)
library(purrrlyr)
library(ggplot2)
temp <- dslabs::brca
df <- cbind(as.data.frame(temp$x), outcome = temp$y)
scaled_df <- df %>%
mutate_if(is.numeric, scale) %>%
group_by(outcome) %>%
dmap(mean)
CodePudding user response:
Something like this?
suppressPackageStartupMessages({
library(tidyverse)
library(purrrlyr)
})
temp <- dslabs::brca
df <- cbind(as.data.frame(temp$x), outcome = temp$y)
scaled_df <- df %>%
mutate_if(is.numeric, scale) %>%
group_by(outcome) %>%
purrrlyr::dmap(mean)
scaled_df %>%
pivot_longer(-outcome) %>%
group_by(name) %>%
summarise(diff_means = diff(value))
#> # A tibble: 30 × 2
#> name diff_means
#> <chr> <dbl>
#> 1 area_mean 1.47
#> 2 area_se 1.13
#> 3 area_worst 1.52
#> 4 compactness_mean 1.23
#> 5 compactness_se 0.605
#> 6 compactness_worst 1.22
#> 7 concave_pts_mean 1.60
#> 8 concave_pts_se 0.843
#> 9 concave_pts_worst 1.64
#> 10 concavity_mean 1.44
#> # … with 20 more rows
Created on 2022-08-04 by the reprex package (v2.0.1)