I have a table that looks like this:
ID term value
1 A cat,dog,snake 10
2 B cat,eel 50
3 C fish,eel 3
4 D fish,dog 6
data.frame(ID = c("A", "B", "C", "D"),
term = c("cat,dog,snake", "cat,eel", "fish,eel", "fish,dog"),
value = c(10, 50, 3, 6))
I have a list of interest:
dog
fish
eel
What I want to do is grep each row for each item in the list and calculate the mean (value column). Like this:
term mean
1 dog 8.0
2 fish 4.5
3 eel 26.5
Where every instance where there is a 'dog'
it calculates the mean
of value
.
Something like this doesn't work:
df %>%
group_by(., grepl(list, term)) %>%
summarise(mean = mean(value))
What I don't want to do is separate each term into its own row because some rows of term have 100s of options. So the only efficient way I can think of is to group by a grep search. Though perhaps I am wrong...
CodePudding user response:
Maybe something like this?
mylist <- c("dog", "fish", "eel")
pattern <- paste0(mylist, collapse = "|")
df %>%
separate_rows(term) %>%
group_by(term = str_extract(term, pattern)) %>%
summarise(mean = mean(value, na.rm = TRUE)) %>%
na.omit()
OR
library(dplyr)
mylist <- c("dog", "fish", "eel")
df %>%
separate_rows(term) %>%
group_by(term) %>%
summarise(mean = mean(value, na.rm = TRUE)) %>%
filter(term %in% mylist)
term mean
<chr> <dbl>
1 dog 8
2 eel 26.5
3 fish 4.5
CodePudding user response:
An option with str_extract
and unnest
library(dplyr)
library(tidyr)
library(stringr)
v1 <- c("dog", "fish", "eel")
df1 %>%
mutate(term = str_extract_all(term, str_c(v1, collapse="|"))) %>%
unnest(term) %>%
group_by(term) %>%
summarise(mean = mean(value))
# A tibble: 3 × 2
term mean
<chr> <dbl>
1 dog 8
2 eel 26.5
3 fish 4.5
CodePudding user response:
Do you want this?
df %>%
separate_rows(term) %>%
filter(term %in% c("dog", "fish", "eel"))