I have a dataset that has survey respondents click an "all that apply" option for their preferences, and returns each option selected as one character string separated by commas.
So, some example responses might be:
- "Networking, Journals, Social Media Groups"
- "Journals"
- "Networking, Social Media Groups"
- "Networking, Journals"
Is there a way to efficiently get a count for each sub-string that appear within the column? The desired output would be
"Networking: 4"
"Journals: 3"
"Social Media Groups: 2"
CodePudding user response:
Data
df <- structure(list(string = c("Networking, Journals, Social Media Groups","Journals", "Networking, Social Media Groups", "Networking, Journals")), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))
# A tibble: 4 x 1
string
<chr>
1 Networking, Journals, Social Media Groups
2 Journals
3 Networking, Social Media Groups
4 Networking, Journals
Code
library(tidyverse)
df %>%
separate_rows(string,sep = ", ") %>%
count(string)
# A tibble: 3 x 2
string n
<chr> <int>
1 Journals 3
2 Networking 3
3 Social Media Groups 2
CodePudding user response:
We can use base R
table(unlist(strsplit(df$string, ",\\s ")))
-output
Journals Networking Social Media Groups
3 3 2
data
df <- structure(list(string = c("Networking, Journals, Social Media Groups",
"Journals", "Networking, Social Media Groups", "Networking, Journals"
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))```
CodePudding user response:
Here is a tidyverse
alternative:
library(tidyverse)
df %>%
mutate(string = strsplit(as.character(string), ",")) %>%
unnest(string) %>%
count(String = str_trim(string))
String n
<chr> <int>
1 Journals 3
2 Networking 3
3 Social Media Groups 2
data:
df <- structure(list(string = c("Networking, Journals, Social Media Groups",
"Journals", "Networking, Social Media Groups", "Networking, Journals"
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))