Establishing counts within inconsistent character strings in R?-CodePudding

I have a dataset that has survey respondents click an "all that apply" option for their preferences, and returns each option selected as one character string separated by commas.

So, some example responses might be:

"Networking, Journals, Social Media Groups"
"Journals"
"Networking, Social Media Groups"
"Networking, Journals"

Is there a way to efficiently get a count for each sub-string that appear within the column? The desired output would be

 "Networking: 4"
 "Journals: 3"
 "Social Media Groups: 2"

CodePudding user response：

Data

df <- structure(list(string = c("Networking, Journals, Social Media Groups","Journals", "Networking, Social Media Groups", "Networking, Journals")), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))


# A tibble: 4 x 1
  string                                   
  <chr>                                    
1 Networking, Journals, Social Media Groups
2 Journals                                 
3 Networking, Social Media Groups          
4 Networking, Journals

Code

library(tidyverse)

df %>% 
  separate_rows(string,sep = ", ") %>% 
  count(string)

# A tibble: 3 x 2
  string                  n
  <chr>               <int>
1 Journals                3
2 Networking              3
3 Social Media Groups     2

CodePudding user response：

We can use base R

table(unlist(strsplit(df$string, ",\\s ")))

-output

            Journals          Networking Social Media Groups 
                  3                   3                   2

data

df <- structure(list(string = c("Networking, Journals, Social Media Groups", 
"Journals", "Networking, Social Media Groups", "Networking, Journals"
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))```

CodePudding user response：

Here is a tidyverse alternative:

library(tidyverse)
df %>% 
    mutate(string = strsplit(as.character(string), ",")) %>% 
    unnest(string) %>% 
    count(String = str_trim(string))

String                  n
  <chr>               <int>
1 Journals                3
2 Networking              3
3 Social Media Groups     2

data:

df <- structure(list(string = c("Networking, Journals, Social Media Groups", 
"Journals", "Networking, Social Media Groups", "Networking, Journals"
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))