Home > Blockchain >  Error in producing frequency tables using multiple columns
Error in producing frequency tables using multiple columns

Time:09-22

I am trying to produce a frequency table using multiple columns. I have a large dataset, so I need to first select a few columns, convert the data to long format, then present a frequency table of only those who responded yes to the specific questions. Please see data structure below. Please note: 1 = yes and 2 = no.

data:

subject_id leq_1_en leq_2_en leq_3_en leq_4_en
191-5467 2 2 1 1
191-6784 1 1 2 2
191-3457 1 1 2 2
191-0987 1 1 2 2
191-1245 1 1 1 1
191-2365 1 2 2 1
191-4589 2 1 1 1
191-9874 1 1 1 1
191-2346 2 2 1 1
191-1256 1 1 1 1
191-6784 2 1 1 1
191-6784 1 1 1 1
191-6784 1 1 1 1
191-6784 1 1 1 1

Desired output:

Variable Frequency
leq_1_en 10
leq_2_en 11
leq_3_en 9
leq_4_en 11

What I have tried so far:

data %>% select(leq_1_en, leq_2_en, leq_3_en,leq_4_en) %>%
  pivot_longer (.,
    names_pattern = "([A-z] )",
    names_to = c("groups"))  %>% 
    drop_na(value) %>% 
    group_by(groups) %>%
  count(value)

output:


 Error in `$<-.data.frame`(`*tmp*`, "call_text", value = c("... %>% count(value)", : 
replacement has 5 rows, data has 3

I tried to adapt the code in the linked questions below, however, it doesn't work. I am also open to using apply function if that is easier.

Frequency table when there are multiple columns representing one value (R)

Adding a column of total n for each group in a stacked frequency table

CodePudding user response:

Here are few options to achieve this:

Base option


colSums(data[,-1]==1)

leq_1_en leq_2_en leq_3_en leq_4_en 
      10       11       10       11

Tidyverse option


library(tidyverse)

data %>% select(!subject_id) %>%
  pivot_longer(everything(), names_to = c("groups")) %>%
  drop_na(value) %>% 
  filter(value == 1) %>%
  group_by(groups) %>%
  dplyr::summarise(n = n())

# A tibble: 4 x 2
  groups       n
  <chr>    <int>
1 leq_1_en    10
2 leq_2_en    11
3 leq_3_en    10
4 leq_4_en    11

Your code with the count is working as well instead of the summarise(), make sure to call the count() from the dplyr package:

  data %>% select(!subject_id) %>%
  pivot_longer(everything(), names_to = c("groups")) %>%
  drop_na(value) %>% 
  filter(value == 1) %>%
  group_by(groups) %>%
  dplyr::count(value)

# A tibble: 4 x 3
# Groups:   groups [4]
  groups   value     n
  <chr>    <dbl> <int>
1 leq_1_en     1    10
2 leq_2_en     1    11
3 leq_3_en     1    10
4 leq_4_en     1    11

Data


data = structure(list(subject_id = c("191-5467 ", "191-6784 ", "191-3457 ", 
                                    "191-0987 ", "191-1245 ", "191-2365 ", "191-4589 ", "191-9874 ", 
                                    "191-2346 ", "191-1256 ", "191-6784 ", "191-6784 ", "191-6784 ", 
                                    "191-6784 "), leq_1_en = c(2, 1, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 
                                                               1, 1), leq_2_en = c(2, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1), 
                     leq_3_en = c(1, 2, 2, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1), leq_4_en = c(1L, 
                                                                                          2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                    -14L))
  • Related