I am analyzing survey data, where people could choose more than one county in a question about where their organization is located. I am trying to create a frequency table that counts every time a county is chosen, whether or not they choose one or multiple counties.
Example of data:
df <- data.frame(org = c("org_1", "org_2", "org 3", "org 4"),
county = c("A, B", "A, D", "B, C", "B"))
Here is the output I would like
output <- data.frame(county = c("A", "B", "C", "D"),
frequency = c(2, 3, 1, 1))
I've tried to use some of the standard frequency table options, such as table(df$county), but this counts "A, B", "A, D", and "B, C" each as unique values, rather than seeing "A", "B", "C", and "D" as individual values.
CodePudding user response:
Use separate_rows
to split the column and get the frequency with count
library(tidyr)
library(dplyr)
df %>%
separate_rows(county) %>%
count(county, name = 'frequency')
-output
# A tibble: 4 × 2
county frequency
<chr> <int>
1 A 2
2 B 3
3 C 1
4 D 1