Home > database >  Run table for all columns in sequence
Run table for all columns in sequence

Time:04-07

If this is my dataset

Id   Col_A_1   Col_A_2    Col_A_3    .....      Col_A_100
1    87        88         82                    88
2    88        82         82                    87
3    82        87         NA                    82
4    88        87         82                    88
5    87        87         87                    88

What is the efficient way to execute table function on these columns from Col_A_1 to Col_A_100 ? I am trying to avoid running the table(df$Col_A_1 , useNA ="ifany"), table(df$Col_A_2 , useNA ="ifany"), .... table(df$Col_A_100 , useNA ="ifany") 100 times.

Also if possible, I like the output saved in a dataframe .

Expected output

    Column         82        85     87       88     Missing
    Col_A_1        1         0      2        2            0   
    Col_A_2        1         0      3        1            0
    Col_A_3        3         0      1        0            1
    .
    .
    .
    Col_A_100      1         0      1        3            0

Thanks in advance.

# example data
d <- read.table(text = "
Id   Col_A_1   Col_A_2    Col_A_3   Col_A_100
1    87        88         82        88
2    88        82         82        87
3    82        87         NA        82
4    88        87         82        88
5    87        87         87        88", header = TRUE)

CodePudding user response:

I just created a small tibble to work with and to illustrate it with.

Tibbles can essentially be considered lists, so lapply works just fine. Since the result can be cumbersome to work with, I put it in a tibble as a list entry:

library(dplyr)
x = tibble(col1 = sample(100,replace = T),
           col2 = sample(100,replace = T),
           col3 = sample(100,replace = T),
           col4 = sample(100,replace = T))

res = tibble(cols = colnames(x),
             tables = lapply(x, function(col) table(col, useNA = "ifany"))) 

# A tibble: 4 x 2
# cols  tables      
# <chr> <named list>
# col1  <table [61]>
# col2  <table [69]>
# col3  <table [60]>
# col4  <table [62]>

EDIT: I did not notice the output format requirement. It can be done (perhaps a bit inelegantly) like this:

#I assume it is all numeric values
unique_names = sapply(res$tables, names) %>% purrr::reduce(union) #get all names present


unique_names_sorted = c(sort(as.numeric(unique_names)), if(any(is.na(unique_names))) "NA") # sort them by value and add in NA, if present

#create dummy matrix
mat = matrix(0, nrow = nrow(res), ncol = length(unique_names_sorted))
#assign corresponding names
colnames(mat) = unique_names_sorted
#populate dummy matrix
for (i in 1:nrow(mat)) {
  tmp = res$tables[[i]]
  if(any(is.na(names(tmp)))) names(tmp)[is.na(names(tmp))] = "NA"
  mat[,names(tmp)] = tmp
}

CodePudding user response:

Excluding Id column reshape from wide-to-long using stack, then table to get counts including NAs, transpose to have column names as rows, then convert table object to dataframe:

data.frame(rbind(t(table(stack(d[, -1]), useNA = "always"))))
#           X82 X87 X88 NA.
# Col_A_1     1   2   2   0
# Col_A_2     1   3   1   0
# Col_A_3     3   1   0   1
# Col_A_100   1   1   3   0
# NA.         0   0   0   0
  • Related