I don't know if I am missing something very obvious here or not, but I am having trouble getting the desired results format for a count. These are all yes, no or NA answers to a question.
My data looks a bit like:
df <- read.table(text = " A B C
0 NA 1
1 0 NA
0 1 0
NA NA 1
0 0 1
1 0 NA
0 1 NA ", header = TRUE)
df %>%
group_by(A, B, C)%>%
summarise(count = n())
I have also tried
count(A, B, C)
with exactly the same results.
I want to count the total number of 0, 1 and NA responses for each column: (rows and columns are interchangeable here, it's the count of response v column format of the table that I'm after.)
Response 0 1 NA
Column A 4 2 1
Column B 3 2 2
Column C 1 3 3
What I am getting instead is
A B C n
0 0 1 1
0 1 0 1
0 1 NA 1
0 NA 1 1
1 0 NA 2
NA NA 1 1
In other words, it's counting the number of times each unique combination of ABC appears. How do I get it to focus on counting the columns and not the rows?
CodePudding user response:
You can apply the table()
function across the columns:
df <- read.table(text = " A B C
0 NA 1
1 0 NA
0 1 0
NA NA 1
0 0 1
1 0 NA
0 1 NA ", header = TRUE)
t(apply(df, 2,table, useNA = "always"))
#> 0 1 <NA>
#> A 4 2 1
#> B 3 2 2
#> C 1 3 3
Created on 2022-08-05 by the reprex package (v2.0.1)
One alternate tidyverse solution would be the following:
library(tidyverse)
df <- read.table(text = " A B C
0 NA 1
1 0 NA
0 1 0
NA NA 1
0 0 1
1 0 NA
0 1 NA ", header = TRUE)
x <- df %>%
mutate(across(everything(), ~fct_explicit_na(as.factor(.x),"NA"))) %>%
map(., ~c(table(.x))) %>%
bind_rows(.id = 'Response')
x
#> # A tibble: 3 × 4
#> Response `0` `1` `NA`
#> <chr> <int> <int> <int>
#> 1 A 4 2 1
#> 2 B 3 2 2
#> 3 C 1 3 3
Created on 2022-08-05 by the reprex package (v2.0.1)
CodePudding user response:
I guess it might need some data re-shaping if you want to use dplyr::n()
.
First transform df
into a "long" format, you'll get a two-column dataframe, from which we can group by everything (group_by_all()
) and do your summarize(n())
. Finally, transform it back to a "wide" format.
library(tidyverse)
df %>% pivot_longer(everything(), names_to = "Response") %>%
group_by_all() %>%
summarize(n = n()) %>%
pivot_wider(names_from = "value", values_from = "n")
# A tibble: 3 × 4
# Groups: Response [3]
Response `0` `1` `NA`
<chr> <int> <int> <int>
1 A 4 2 1
2 B 3 2 2
3 C 1 3 3
CodePudding user response:
Using table
and stack
you can try:
t(table(stack(df), useNA = "ifany"))
Output
values
ind 0 1 <NA>
A 4 2 1
B 3 2 2
C 1 3 3
CodePudding user response:
If you find yourself wanting to apply the same operation to multiple columns in your data it could be a hint that you should reshape your data to a "longer" format, such that each row represents a single observation. Once your data is in this format you can use table()
to get the summary you're after:
df_tidy <-
df %>%
pivot_longer(cols = everything(), names_to = "group", values_to = "response")
print(df_tidy)
#> # A tibble: 21 x 2
#> group response
#> <chr> <int>
#> 1 A 0
#> 2 B NA
#> 3 C 1
#> 4 A 1
#> 5 B 0
#> 6 C NA
#> 7 A 0
#> 8 B 1
#> 9 C 0
#> 10 A NA
#> # … with 11 more rows
table(df_tidy, useNA = "ifany")
#> response
#> group 0 1 <NA>
#> A 4 2 1
#> B 3 2 2
#> C 1 3 3