Home > Software engineering >  Grep a list of terms, group by and summarise values using dplyr
Grep a list of terms, group by and summarise values using dplyr

Time:10-25

I have a table that looks like this:

  ID          term value
1  A cat,dog,snake    10
2  B       cat,eel    50
3  C      fish,eel     3
4  D      fish,dog     6

data.frame(ID = c("A", "B", "C", "D"),
           term = c("cat,dog,snake", "cat,eel", "fish,eel", "fish,dog"),
           value = c(10, 50, 3, 6))

I have a list of interest:

dog
fish
eel

What I want to do is grep each row for each item in the list and calculate the mean (value column). Like this:

  term mean
1  dog  8.0
2 fish  4.5
3  eel 26.5

Where every instance where there is a 'dog' it calculates the mean of value.

Something like this doesn't work:

df %>% 
  group_by(., grepl(list, term)) %>% 
  summarise(mean = mean(value))

What I don't want to do is separate each term into its own row because some rows of term have 100s of options. So the only efficient way I can think of is to group by a grep search. Though perhaps I am wrong...

CodePudding user response:

Maybe something like this?

mylist <- c("dog", "fish", "eel")
pattern <- paste0(mylist, collapse = "|")


df %>% 
  separate_rows(term) %>% 
  group_by(term = str_extract(term, pattern)) %>% 
  summarise(mean = mean(value, na.rm = TRUE)) %>% 
  na.omit()

OR

library(dplyr)

mylist <- c("dog", "fish", "eel")

df %>% 
  separate_rows(term) %>% 
  group_by(term) %>% 
  summarise(mean = mean(value, na.rm = TRUE)) %>% 
  filter(term %in% mylist)
 term   mean
  <chr> <dbl>
1 dog     8  
2 eel    26.5
3 fish    4.5

CodePudding user response:

An option with str_extract and unnest

library(dplyr)
library(tidyr)
library(stringr)
v1 <-  c("dog", "fish", "eel")
df1 %>%
  mutate(term = str_extract_all(term, str_c(v1, collapse="|"))) %>% 
  unnest(term) %>% 
  group_by(term) %>% 
  summarise(mean = mean(value))
# A tibble: 3 × 2
  term   mean
  <chr> <dbl>
1 dog     8  
2 eel    26.5
3 fish    4.5

CodePudding user response:

Do you want this?


df %>% 
  separate_rows(term) %>% 
  filter(term %in% c("dog", "fish", "eel"))
  • Related