Home > Software engineering >  Using Mann Whitney to create list of pvalues
Using Mann Whitney to create list of pvalues

Time:01-28

I am looking to find a way to write a for loop using a Mann Whitney Wilcox test in R. The command I have seen online is wilcox.test(). But I am trying to do so for large data sets that have thousands of columns. I have not found any luck online for other resources.

I have this data frame, DF1, and two groups (Sam and Anna) under DF1$Name. The nonparametric approach with two groups is desired for this data, and I want to run a for loop to get the p-value for each of the column names (Companies, Store, Cars, and Homes) when comparing the two groups (Sam and Anna). Is there an efficient way to do so?

DF1:

Name Companies Store Cars Homes
Sam 23 10 10 8
Anna 21 8 7 4
Anna 22 5 5 5
Sam 24 5 6 8
Anna 45 6 7 4

My goal is to get a list of p-values generated. Any suggestions would be appreciated! Thank you!

# DF1
Name <- c("Sam", "Anna", "Anna", "Sam", "Anna")
Companies <- c(23, 21, 22, 24, 45)
Store <- c(10, 8, 5, 5, 6)
Cars <- c(10, 7, 5, 6, 7)
Home <- c(8, 4, 5, 8, 4)
DF1 <- data.frame(Name, Companies, Store, Cars, Home)

I have tried this so far, and it definitely doesnt work, but I feel this is something towards I want to get. The code below was the first part of the test that was derived from here. But now is there a way to grab all of the p-values here in a list next to the descriptors (companies, store, cars, homes?)

DF1$Group <- as.factor(DF1$Name)

Z <- lapply(DF1[-1], function(x){
    wilcox.test(x ~ DF1$Name)
})

CodePudding user response:

Here is one way:

library(tidyverse)

DF1 %>% 
  select_if(is.numeric) %>%
  map_df(~ broom::tidy(wilcox.test(. ~ Name)), .id = 'var')
# A tibble: 4 × 5
  var       statistic p.value method                                            alternative
  <chr>         <dbl>   <dbl> <chr>                                             <chr>      
1 Companies       2     0.8   Wilcoxon rank sum exact test                      two.sided  
2 Store           2.5   1     Wilcoxon rank sum test with continuity correction two.sided  
3 Cars            2     0.767 Wilcoxon rank sum test with continuity correction two.sided  
4 Home            0     0.128 Wilcoxon rank sum test with continuity correction two.sided 

CodePudding user response:

We may do

library(dplyr)
library(tidyr)
DF1 %>% 
  reframe(across(where(is.numeric), ~ broom::tidy(wilcox.test(.x ~ Name)))) %>% 
  pivot_longer(cols = everything()) %>%
  unpack(where(is_tibble))

-output

# A tibble: 4 × 5
  name      statistic p.value method                                            alternative
  <chr>         <dbl>   <dbl> <chr>                                             <chr>      
1 Companies       2     0.8   Wilcoxon rank sum exact test                      two.sided  
2 Store           2.5   1     Wilcoxon rank sum test with continuity correction two.sided  
3 Cars            2     0.767 Wilcoxon rank sum test with continuity correction two.sided  
4 Home            0     0.128 Wilcoxon rank sum test with continuity correction two.sided  
  • Related