I a statistical test on this data set by using the following codes:
# DF1
Name <- c("Sam", "Anna", "Anna", "Sam", "Anna")
Companies <- c(23, 21, 22, 24, 45)
Store <- c(10, 8, 5, 5, 6)
Cars <- c(10, 7, 5, 6, 7)
Home <- c(8, 4, 5, 8, 4)
DF1 <- data.frame(Name, Companies, Store, Cars, Home)
DF1$Name <- as.factor(DF1$Name)
Z <- lapply(DF1[-1], function(x){
wilcox.test(x ~ DF1$Name)
})
Now in Z, there is a list in a list for each column name. For example, when I look at the data frame Z, and click Companies, I see statistic and null.value. I am trying to unlist them to have them in separate columns based on the specific group it belongs to (shown below). The code I use is this, but this isnt quite what I am looking for, and cant find anything else online.
Z_unlisted <- as.data.frame(unlist(Z))
I am not sure why I am really confused on this, since I feel it should be pretty simple using unlist(), but all of the lists expand out into one column, but not into separate columns.
How can I unlist all of these out so each category (statistic, parameter, p.value, etc.) is in the column, and the groupings (Companies, Store, Cars, and Home are in one column? Example shown: (where each column is part of the list)
Companies 2 0.8 Wilcoxon rank sum exact test two.sided
Store 2.5 1 Wilcoxon rank sum test with continuity correction two.sided
Cars 2 0.767 Wilcoxon rank sum test with continuity correction two.sided
Home 0 0.128 Wilcoxon rank sum test with continuity correction two.sided
CodePudding user response:
Instead of unlist
ing on the whole list, do it inside and rbind
do.call(rbind, lapply(Z, unlist))
CodePudding user response:
Use tidy from broom and apply it using map_dfr from purrr:
library(broom)
library(purrr)
map_dfr(DF1[-1], ~ tidy(wilcox.test(.x ~ Name, DF1)), .id = "Companies")
giving
# A tibble: 4 x 5
Companies statistic p.value method alternative
<chr> <dbl> <dbl> <chr> <chr>
1 Companies 2 0.8 Wilcoxon rank sum exact test two.sided
2 Store 2.5 1 Wilcoxon rank sum test with continuit~ two.sided
3 Cars 2 0.767 Wilcoxon rank sum test with continuit~ two.sided
4 Home 0 0.128 Wilcoxon rank sum test with continuit~ two.sided
CodePudding user response:
If you look at the structure of Z you see that it is a list of 4 items, each of which is an instance of an "htest"-classed value.
str(Z)
#---------
List of 4
$ Companies:List of 7
..$ statistic : Named num 2
.. ..- attr(*, "names")= chr "W"
..$ parameter : NULL
..$ p.value : num 0.8
..$ null.value : Named num 0
.. ..- attr(*, "names")= chr "location shift"
..$ alternative: chr "two.sided"
..$ method : chr "Wilcoxon rank sum exact test"
..$ data.name : chr "x by DF1$Name"
..- attr(*, "class")= chr "htest"
# -- remaining items have similar structures.
Since each of the sub-items in those values is of length-1 it's easy to construct a matrix using the do.call(rbind, ...)
approach suggested by @akrun.
do.call(rbind, Z)
statistic parameter p.value null.value alternative
Companies 2 NULL 0.8 0 "two.sided"
Store 2.5 NULL 1 0 "two.sided"
Cars 2 NULL 0.7670969 0 "two.sided"
Home 0 NULL 0.1281466 0 "two.sided"
method data.name
Companies "Wilcoxon rank sum exact test" "x by DF1$Name"
Store "Wilcoxon rank sum test with continuity correction" "x by DF1$Name"
Cars "Wilcoxon rank sum test with continuity correction" "x by DF1$Name"
Note that the groupings (Companies, Store, Cars, and Home) are not actually a column but are rownames.
You could have achieved a matrix result (although it will be the transpose of above) more simply by using sapply
instead of lapply
, beacus sapply
has a default option of "simplifying" its results if they are all the same length:
( Z <- sapply(DF1[-1], function(x){
wilcox.test(x ~ DF1$Name)
}) )
#----------
Companies Store
statistic 2 2.5
parameter NULL NULL
p.value 0.8 1
null.value 0 0
alternative "two.sided" "two.sided"
method "Wilcoxon rank sum exact test" "Wilcoxon rank sum test with continuity correction"
data.name "x by DF1$Name" "x by DF1$Name"
Cars
statistic 2
parameter NULL
p.value 0.7670969
null.value 0
alternative "two.sided"
method "Wilcoxon rank sum test with continuity correction"
data.name "x by DF1$Name"
Home
statistic 0
parameter NULL
p.value 0.1281466
null.value 0
alternative "two.sided"
method "Wilcoxon rank sum test with continuity correction"
data.name "x by DF1$Name"
Warning messages:
1: In wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...) :
cannot compute exact p-value with ties
2: In wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...) :
cannot compute exact p-value with ties
3: In wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...) :
cannot compute exact p-value with ties