I have a dataframe :
details <- "I stay in april in Barcelona, after in may in Paris and finally in july in London. april is a cool month. july has very hot temperatures"
df <- as.data.frame(details)
I can extract months words before cities in a new colmun witth mutate
and str_extract
from tidyverse
package:
library(tidyverse)
dfmonths<- df %>%
mutate(city_months = str_extract(details, "april|may|july(?=\\sin)")
...but only the fist matched month is in my result :
details city_months
1 I stay in april in Barcelona (...) and finally in july in Paris.... april
I tried :
library(tidyverse)
dfmonths<- df %>%
mutate(city_months = str_extract_all(details,
"april|may|july(?=\\sin)")
but I have only one row :
details city_months
1 I stay in april in Barcelona (...) and finally in july in Paris.... april, may, july
How can I do that :
details city_months
1 I stay in april in Barcelona (...) and finally in july in Paris....april
2 I stay in april in Barcelona (...) and finally in july in Paris.... may
3 I stay in april in Barcelona (...) and finally in july in Paris....july
CodePudding user response:
Use str_extract_all
, i.e
library(dplyr)
library(stringr)
library(tidyr)
df %>%
mutate(new = str_extract_all(details, "april|may|july(?=\\sin)")) %>%
unnest() %>%
distinct()
# A tibble: 3 x 2
details new
<chr> <chr>
1 I stay in april in Barcelona, after in may in Paris and finally in july in London. april is a cool month. july has very hot temperatures april
2 I stay in april in Barcelona, after in may in Paris and finally in july in London. april is a cool month. july has very hot temperatures may
3 I stay in april in Barcelona, after in may in Paris and finally in july in London. april is a cool month. july has very hot temperatures july