I am trying to write a function that would extract surroundings of a keyword. If there are more instances of the keyword, surrounding of each would be combined in final output. Current version works well over a single string with 2 keyword instances :) However, does NOT work when used within tidy pipe with mutate. I tried to write an easy function "first_letter" to test that mutate operates over single string instead of concatenates the whole column into a single character vector and it works well.
submarine <- 'We all live in a yellow submarine'
yesterday <- 'Yesterday all my troubles seemed so far away, all of them'
my_data <- data.frame(text=c(submarine, yesterday))
pat <- "all"
first_letter <- function(x){
fl_res <- substr(x,1,1)
return(fl_res)
}
my_data_fl <- my_data %>% dplyr::mutate(first=first_letter(text))
# Target function that works with string but not within mutate
# loc is a data.frame generated within the function
# pat is the keyword
# I tried to replace x with .data, but it does not help
term_surr <- function(x, pat, before_char=5, after_char=15){
loc <- x %>%
stringr::str_locate_all(pat) %>%
as.data.frame() %>%
tibble::as_tibble %>%
dplyr::mutate(from=start - before_char) %>%
dplyr::mutate(to=end after_char) %>%
dplyr::select(from, to) %>%
dplyr::mutate(surr = stringr::str_sub(x, .data$from, .data$to))
res_txt <- purrr::map_chr(loc$surr, ~paste(.x, sep = ". ")) %>% stringi::stri_paste(collapse=' ... ')
return(res_txt)
}
# FUNCTIONAL with text input as string
# surr <- term_surr(yesterday, pat=pat)
# NOT FUNCTIONAL with dataframe column
# my_data_surr <- my_data %>% mutate(surr= term_surr(text, pat=pat))
If there is a tutorial on using tidy/dplyr pipes within function, please share a link with me. I would be happy for any suggestion about the code above.
CodePudding user response:
The complete code would be:
library(tidyverse)
submarine <- 'We all live in a yellow submarine'
yesterday <- 'Yesterday all my troubles seemed so far away, all of them'
my_data <- data.frame(text=c(submarine, yesterday))
pat <- "all"res <- my_data %>%
mutate(surr = str_locate_all(text, pat) %>%
map(~ as.data.frame(.x) %>%
transmute(from = start - 2, to = end 15))) %>%
unnest(surr, names_sep = ".") %>%
mutate(surr = str_sub(text, surr.from, surr.to)) %>%
group_by(text) %>%
summarise(surr = str_c(surr, collapse = " ... ")) %>%
pull(surr) %>%
bind_cols(., my_data) %>%
rename_all(funs(c("surr", "text"))) %>%
select(2,1)