I am trying to produce a frequency table using multiple columns. I have a large dataset, so I need to first select a few columns, convert the data to long format, then present a frequency table of only those who responded yes to the specific questions. Please see data structure below. Please note: 1 = yes and 2 = no.
data:
subject_id | leq_1_en | leq_2_en | leq_3_en | leq_4_en |
---|---|---|---|---|
191-5467 | 2 | 2 | 1 | 1 |
191-6784 | 1 | 1 | 2 | 2 |
191-3457 | 1 | 1 | 2 | 2 |
191-0987 | 1 | 1 | 2 | 2 |
191-1245 | 1 | 1 | 1 | 1 |
191-2365 | 1 | 2 | 2 | 1 |
191-4589 | 2 | 1 | 1 | 1 |
191-9874 | 1 | 1 | 1 | 1 |
191-2346 | 2 | 2 | 1 | 1 |
191-1256 | 1 | 1 | 1 | 1 |
191-6784 | 2 | 1 | 1 | 1 |
191-6784 | 1 | 1 | 1 | 1 |
191-6784 | 1 | 1 | 1 | 1 |
191-6784 | 1 | 1 | 1 | 1 |
Desired output:
Variable | Frequency |
---|---|
leq_1_en | 10 |
leq_2_en | 11 |
leq_3_en | 9 |
leq_4_en | 11 |
What I have tried so far:
data %>% select(leq_1_en, leq_2_en, leq_3_en,leq_4_en) %>%
pivot_longer (.,
names_pattern = "([A-z] )",
names_to = c("groups")) %>%
drop_na(value) %>%
group_by(groups) %>%
count(value)
output:
Error in `$<-.data.frame`(`*tmp*`, "call_text", value = c("... %>% count(value)", :
replacement has 5 rows, data has 3
I tried to adapt the code in the linked questions below, however, it doesn't work. I am also open to using apply
function if that is easier.
Frequency table when there are multiple columns representing one value (R)
Adding a column of total n for each group in a stacked frequency table
CodePudding user response:
Here are few options to achieve this:
Base option
colSums(data[,-1]==1)
leq_1_en leq_2_en leq_3_en leq_4_en
10 11 10 11
Tidyverse option
library(tidyverse)
data %>% select(!subject_id) %>%
pivot_longer(everything(), names_to = c("groups")) %>%
drop_na(value) %>%
filter(value == 1) %>%
group_by(groups) %>%
dplyr::summarise(n = n())
# A tibble: 4 x 2
groups n
<chr> <int>
1 leq_1_en 10
2 leq_2_en 11
3 leq_3_en 10
4 leq_4_en 11
Your code with the count
is working as well instead of the summarise()
, make sure to call the count()
from the dplyr
package:
data %>% select(!subject_id) %>%
pivot_longer(everything(), names_to = c("groups")) %>%
drop_na(value) %>%
filter(value == 1) %>%
group_by(groups) %>%
dplyr::count(value)
# A tibble: 4 x 3
# Groups: groups [4]
groups value n
<chr> <dbl> <int>
1 leq_1_en 1 10
2 leq_2_en 1 11
3 leq_3_en 1 10
4 leq_4_en 1 11
Data
data = structure(list(subject_id = c("191-5467 ", "191-6784 ", "191-3457 ",
"191-0987 ", "191-1245 ", "191-2365 ", "191-4589 ", "191-9874 ",
"191-2346 ", "191-1256 ", "191-6784 ", "191-6784 ", "191-6784 ",
"191-6784 "), leq_1_en = c(2, 1, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1,
1, 1), leq_2_en = c(2, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1),
leq_3_en = c(1, 2, 2, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1), leq_4_en = c(1L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-14L))