Counting number of strings despite multiple elements in one cell-CodePudding

I got a vector A <- c("Tom; Jerry", "Lisa; Marc") and try to identity the number of occurrences of every name.

I already used the code: sort(table(unlist(strsplit(A, ""))), decreasing = TRUE)

However, this code is only able to create output like this: Tom; Jerry: 1 - Lisa; Marc: 1

I am looking for a way to count every name, despite the fact, that two names are present in one cell. Consequently, my preferred result would be:

Tom: 1 Jerry: 1 Lisa: 1 Marc:1

CodePudding user response：

The split should be ; followed by zero or more spaces (\\s*)

sort(table(unlist(strsplit(A, ";\\s*"))), decreasing = TRUE)

-output

Jerry  Lisa  Marc   Tom 
    1     1     1     1

CodePudding user response：

Use separate_rows to split the strings, group_by the names and summarise them:

library(tidyverse)
data.frame(A) %>%
  separate_rows(A, sep = "; ") %>%
  group_by(A) %>%
  summarise(N = n())
# A tibble: 4 × 2
  A         N
  <chr> <int>
1 Jerry     1
2 Lisa      1
3 Marc      1
4 Tom       1