I am looking to find a way to write a for loop using a Mann Whitney Wilcox test in R. The command I have seen online is wilcox.test(). But I am trying to do so for large data sets that have thousands of columns. I have not found any luck online for other resources.
I have this data frame, DF1, and two groups (Sam and Anna) under DF1$Name. The nonparametric approach with two groups is desired for this data, and I want to run a for loop to get the p-value for each of the column names (Companies, Store, Cars, and Homes) when comparing the two groups (Sam and Anna). Is there an efficient way to do so?
DF1:
Name | Companies | Store | Cars | Homes |
---|---|---|---|---|
Sam | 23 | 10 | 10 | 8 |
Anna | 21 | 8 | 7 | 4 |
Anna | 22 | 5 | 5 | 5 |
Sam | 24 | 5 | 6 | 8 |
Anna | 45 | 6 | 7 | 4 |
My goal is to get a list of p-values generated. Any suggestions would be appreciated! Thank you!
# DF1
Name <- c("Sam", "Anna", "Anna", "Sam", "Anna")
Companies <- c(23, 21, 22, 24, 45)
Store <- c(10, 8, 5, 5, 6)
Cars <- c(10, 7, 5, 6, 7)
Home <- c(8, 4, 5, 8, 4)
DF1 <- data.frame(Name, Companies, Store, Cars, Home)
I have tried this so far, and it definitely doesnt work, but I feel this is something towards I want to get. The code below was the first part of the test that was derived from here. But now is there a way to grab all of the p-values here in a list next to the descriptors (companies, store, cars, homes?)
DF1$Group <- as.factor(DF1$Name)
Z <- lapply(DF1[-1], function(x){
wilcox.test(x ~ DF1$Name)
})
CodePudding user response:
Here is one way:
library(tidyverse)
DF1 %>%
select_if(is.numeric) %>%
map_df(~ broom::tidy(wilcox.test(. ~ Name)), .id = 'var')
# A tibble: 4 × 5
var statistic p.value method alternative
<chr> <dbl> <dbl> <chr> <chr>
1 Companies 2 0.8 Wilcoxon rank sum exact test two.sided
2 Store 2.5 1 Wilcoxon rank sum test with continuity correction two.sided
3 Cars 2 0.767 Wilcoxon rank sum test with continuity correction two.sided
4 Home 0 0.128 Wilcoxon rank sum test with continuity correction two.sided
CodePudding user response:
We may do
library(dplyr)
library(tidyr)
DF1 %>%
reframe(across(where(is.numeric), ~ broom::tidy(wilcox.test(.x ~ Name)))) %>%
pivot_longer(cols = everything()) %>%
unpack(where(is_tibble))
-output
# A tibble: 4 × 5
name statistic p.value method alternative
<chr> <dbl> <dbl> <chr> <chr>
1 Companies 2 0.8 Wilcoxon rank sum exact test two.sided
2 Store 2.5 1 Wilcoxon rank sum test with continuity correction two.sided
3 Cars 2 0.767 Wilcoxon rank sum test with continuity correction two.sided
4 Home 0 0.128 Wilcoxon rank sum test with continuity correction two.sided