Home > database >  R total multiple columns at once with n()
R total multiple columns at once with n()

Time:08-06

I don't know if I am missing something very obvious here or not, but I am having trouble getting the desired results format for a count. These are all yes, no or NA answers to a question.

My data looks a bit like:

df <- read.table(text = " A  B  C
0   NA   1
1   0   NA
0   1   0
NA  NA  1
0   0   1
1   0   NA
0   1   NA  ", header = TRUE)

df %>%
group_by(A, B, C)%>% 
  summarise(count = n())

I have also tried

count(A, B, C)

with exactly the same results.

I want to count the total number of 0, 1 and NA responses for each column: (rows and columns are interchangeable here, it's the count of response v column format of the table that I'm after.)

Response 0 1 NA
Column A 4 2 1
Column B 3 2 2
Column C 1 3 3

What I am getting instead is

A  B  C  n
0  0  1  1
0  1  0  1
0  1  NA 1
0  NA 1  1 
1  0  NA 2
NA NA 1  1

In other words, it's counting the number of times each unique combination of ABC appears. How do I get it to focus on counting the columns and not the rows?

CodePudding user response:

You can apply the table() function across the columns:

df <- read.table(text = " A  B  C
0   NA   1
1   0   NA
0   1   0
NA  NA  1
0   0   1
1   0   NA
0   1   NA  ", header = TRUE)


t(apply(df, 2,table, useNA = "always"))
#>   0 1 <NA>
#> A 4 2    1
#> B 3 2    2
#> C 1 3    3

Created on 2022-08-05 by the reprex package (v2.0.1)

One alternate tidyverse solution would be the following:

library(tidyverse)

df <- read.table(text = " A  B  C
0   NA   1
1   0   NA
0   1   0
NA  NA  1
0   0   1
1   0   NA
0   1   NA  ", header = TRUE)


x <- df %>%
  mutate(across(everything(), ~fct_explicit_na(as.factor(.x),"NA"))) %>%
  map(., ~c(table(.x))) %>%
  bind_rows(.id = 'Response') 

x
#> # A tibble: 3 × 4
#>   Response   `0`   `1`  `NA`
#>   <chr>    <int> <int> <int>
#> 1 A            4     2     1
#> 2 B            3     2     2
#> 3 C            1     3     3

Created on 2022-08-05 by the reprex package (v2.0.1)

CodePudding user response:

I guess it might need some data re-shaping if you want to use dplyr::n().

First transform df into a "long" format, you'll get a two-column dataframe, from which we can group by everything (group_by_all()) and do your summarize(n()). Finally, transform it back to a "wide" format.

library(tidyverse)

df %>% pivot_longer(everything(), names_to = "Response") %>% 
  group_by_all() %>% 
  summarize(n = n()) %>% 
  pivot_wider(names_from = "value", values_from = "n")

# A tibble: 3 × 4
# Groups:   Response [3]
  Response   `0`   `1`  `NA`
  <chr>    <int> <int> <int>
1 A            4     2     1
2 B            3     2     2
3 C            1     3     3

CodePudding user response:

Using table and stack you can try:

t(table(stack(df), useNA = "ifany"))

Output

   values
ind 0 1 <NA>
  A 4 2    1
  B 3 2    2
  C 1 3    3

CodePudding user response:

If you find yourself wanting to apply the same operation to multiple columns in your data it could be a hint that you should reshape your data to a "longer" format, such that each row represents a single observation. Once your data is in this format you can use table() to get the summary you're after:

df_tidy <- 
  df %>% 
  pivot_longer(cols = everything(), names_to = "group", values_to = "response")

print(df_tidy)
#> # A tibble: 21 x 2
#>    group response
#>    <chr>    <int>
#>  1 A            0
#>  2 B           NA
#>  3 C            1
#>  4 A            1
#>  5 B            0
#>  6 C           NA
#>  7 A            0
#>  8 B            1
#>  9 C            0
#> 10 A           NA
#> # … with 11 more rows

table(df_tidy, useNA = "ifany")
#>      response
#> group 0 1 <NA>
#>     A 4 2    1
#>     B 3 2    2
#>     C 1 3    3
  • Related