Home > Software engineering >  Unlisting a dataset to separate columns
Unlisting a dataset to separate columns

Time:01-28

I a statistical test on this data set by using the following codes:

# DF1
Name <- c("Sam", "Anna", "Anna", "Sam", "Anna")
Companies <- c(23, 21, 22, 24, 45)
Store <- c(10, 8, 5, 5, 6)
Cars <- c(10, 7, 5, 6, 7)
Home <- c(8, 4, 5, 8, 4)
DF1 <- data.frame(Name, Companies, Store, Cars, Home)

DF1$Name <- as.factor(DF1$Name)

Z <- lapply(DF1[-1], function(x){
    wilcox.test(x ~ DF1$Name)
})

Now in Z, there is a list in a list for each column name. For example, when I look at the data frame Z, and click Companies, I see statistic and null.value. I am trying to unlist them to have them in separate columns based on the specific group it belongs to (shown below). The code I use is this, but this isnt quite what I am looking for, and cant find anything else online.

Z_unlisted <- as.data.frame(unlist(Z))

I am not sure why I am really confused on this, since I feel it should be pretty simple using unlist(), but all of the lists expand out into one column, but not into separate columns.

How can I unlist all of these out so each category (statistic, parameter, p.value, etc.) is in the column, and the groupings (Companies, Store, Cars, and Home are in one column? Example shown: (where each column is part of the list)

Companies       2     0.8   Wilcoxon rank sum exact test                      two.sided  
Store           2.5   1     Wilcoxon rank sum test with continuity correction two.sided  
Cars            2     0.767 Wilcoxon rank sum test with continuity correction two.sided  
Home            0     0.128 Wilcoxon rank sum test with continuity correction two.sided 

CodePudding user response:

Instead of unlisting on the whole list, do it inside and rbind

 do.call(rbind, lapply(Z, unlist))

CodePudding user response:

Use tidy from broom and apply it using map_dfr from purrr:

library(broom)
library(purrr)

map_dfr(DF1[-1], ~ tidy(wilcox.test(.x ~ Name, DF1)), .id = "Companies")

giving

# A tibble: 4 x 5
  Companies statistic p.value method                                 alternative
  <chr>         <dbl>   <dbl> <chr>                                  <chr>      
1 Companies       2     0.8   Wilcoxon rank sum exact test           two.sided  
2 Store           2.5   1     Wilcoxon rank sum test with continuit~ two.sided  
3 Cars            2     0.767 Wilcoxon rank sum test with continuit~ two.sided  
4 Home            0     0.128 Wilcoxon rank sum test with continuit~ two.sided  

CodePudding user response:

If you look at the structure of Z you see that it is a list of 4 items, each of which is an instance of an "htest"-classed value.

str(Z)
#---------
List of 4
 $ Companies:List of 7
  ..$ statistic  : Named num 2
  .. ..- attr(*, "names")= chr "W"
  ..$ parameter  : NULL
  ..$ p.value    : num 0.8
  ..$ null.value : Named num 0
  .. ..- attr(*, "names")= chr "location shift"
  ..$ alternative: chr "two.sided"
  ..$ method     : chr "Wilcoxon rank sum exact test"
  ..$ data.name  : chr "x by DF1$Name"
  ..- attr(*, "class")= chr "htest"
# -- remaining items have similar structures.

Since each of the sub-items in those values is of length-1 it's easy to construct a matrix using the do.call(rbind, ...) approach suggested by @akrun.

do.call(rbind, Z)
          statistic parameter p.value   null.value alternative
Companies 2         NULL      0.8       0          "two.sided"
Store     2.5       NULL      1         0          "two.sided"
Cars      2         NULL      0.7670969 0          "two.sided"
Home      0         NULL      0.1281466 0          "two.sided"
          method                                              data.name      
Companies "Wilcoxon rank sum exact test"                      "x by DF1$Name"
Store     "Wilcoxon rank sum test with continuity correction" "x by DF1$Name"
Cars      "Wilcoxon rank sum test with continuity correction" "x by DF1$Name"

Note that the groupings (Companies, Store, Cars, and Home) are not actually a column but are rownames.

You could have achieved a matrix result (although it will be the transpose of above) more simply by using sapply instead of lapply, beacus sapply has a default option of "simplifying" its results if they are all the same length:

( Z <- sapply(DF1[-1], function(x){
     wilcox.test(x ~ DF1$Name)
 }) )
#----------
            Companies                      Store                                              
statistic   2                              2.5                                                
parameter   NULL                           NULL                                               
p.value     0.8                            1                                                  
null.value  0                              0                                                  
alternative "two.sided"                    "two.sided"                                        
method      "Wilcoxon rank sum exact test" "Wilcoxon rank sum test with continuity correction"
data.name   "x by DF1$Name"                "x by DF1$Name"                                    
            Cars                                               
statistic   2                                                  
parameter   NULL                                               
p.value     0.7670969                                          
null.value  0                                                  
alternative "two.sided"                                        
method      "Wilcoxon rank sum test with continuity correction"
data.name   "x by DF1$Name"                                    
            Home                                               
statistic   0                                                  
parameter   NULL                                               
p.value     0.1281466                                          
null.value  0                                                  
alternative "two.sided"                                        
method      "Wilcoxon rank sum test with continuity correction"
data.name   "x by DF1$Name"                                    
Warning messages:
1: In wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...) :
  cannot compute exact p-value with ties
2: In wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...) :
  cannot compute exact p-value with ties
3: In wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...) :
  cannot compute exact p-value with ties
  • Related