If this is my dataset
Id Col_A_1 Col_A_2 Col_A_3 ..... Col_A_100
1 87 88 82 88
2 88 82 82 87
3 82 87 NA 82
4 88 87 82 88
5 87 87 87 88
What is the efficient way to execute table function on these columns from Col_A_1 to Col_A_100 ? I am trying to avoid running the table(df$Col_A_1 , useNA ="ifany")
, table(df$Col_A_2 , useNA ="ifany")
, .... table(df$Col_A_100 , useNA ="ifany")
100 times.
Also if possible, I like the output saved in a dataframe .
Expected output
Column 82 85 87 88 Missing
Col_A_1 1 0 2 2 0
Col_A_2 1 0 3 1 0
Col_A_3 3 0 1 0 1
.
.
.
Col_A_100 1 0 1 3 0
Thanks in advance.
# example data
d <- read.table(text = "
Id Col_A_1 Col_A_2 Col_A_3 Col_A_100
1 87 88 82 88
2 88 82 82 87
3 82 87 NA 82
4 88 87 82 88
5 87 87 87 88", header = TRUE)
CodePudding user response:
I just created a small tibble to work with and to illustrate it with.
Tibbles can essentially be considered lists, so lapply
works just fine. Since the result can be cumbersome to work with, I put it in a tibble as a list entry:
library(dplyr)
x = tibble(col1 = sample(100,replace = T),
col2 = sample(100,replace = T),
col3 = sample(100,replace = T),
col4 = sample(100,replace = T))
res = tibble(cols = colnames(x),
tables = lapply(x, function(col) table(col, useNA = "ifany")))
# A tibble: 4 x 2
# cols tables
# <chr> <named list>
# col1 <table [61]>
# col2 <table [69]>
# col3 <table [60]>
# col4 <table [62]>
EDIT: I did not notice the output format requirement. It can be done (perhaps a bit inelegantly) like this:
#I assume it is all numeric values
unique_names = sapply(res$tables, names) %>% purrr::reduce(union) #get all names present
unique_names_sorted = c(sort(as.numeric(unique_names)), if(any(is.na(unique_names))) "NA") # sort them by value and add in NA, if present
#create dummy matrix
mat = matrix(0, nrow = nrow(res), ncol = length(unique_names_sorted))
#assign corresponding names
colnames(mat) = unique_names_sorted
#populate dummy matrix
for (i in 1:nrow(mat)) {
tmp = res$tables[[i]]
if(any(is.na(names(tmp)))) names(tmp)[is.na(names(tmp))] = "NA"
mat[,names(tmp)] = tmp
}
CodePudding user response:
Excluding Id column reshape from wide-to-long using stack, then table to get counts including NAs, t
ranspose to have column names as rows, then convert table object to dataframe:
data.frame(rbind(t(table(stack(d[, -1]), useNA = "always"))))
# X82 X87 X88 NA.
# Col_A_1 1 2 2 0
# Col_A_2 1 3 1 0
# Col_A_3 3 1 0 1
# Col_A_100 1 1 3 0
# NA. 0 0 0 0