I want to bind an entire vector to each row of a dataframe, then unnest so I can search some text for a keyword.
Basically I want to this:
library(dplyr)
library(stringr)
df <- tibble(text = c(
"I really like apples, tomoatoes, and cucumbers",
"I really like berries, chocolate, and apples",
"I really like avocado, chocolate, and carrots"
))
keywords <- c("apple",
"tomato",
"cucumbe",
"chocolate",
"avocado")
df %>%
mutate(keyword = keywords)
But that results in this error:
Error: Problem with `mutate()` input `keyword`.
x Input `keyword` can't be recycled to size 6.
i Input `keyword` is `keywords`.
i Input `keyword` must be size 6 or 1, not 5.
Run `rlang::last_error()` to see where the error occurred.
So I'm finding myself doing this intermediary step which seems unnecessary:
keywords_string <- str_c(keywords$keyword, collapse = ",")
df %>%
mutate(keyword = keywords_string,
keyword = str_split(keyword, ","))
Is there a way for me to do this without first making collapsing the vector to a string and then splitting the string? Thanks!
CodePudding user response:
Wrapping it in list
is an easier way to get the output for the second code. But, if you're just going to unnest
right after you could do it all in one step with tidyr::expand_grid
library(tidyr)
df %>%
expand_grid(keyword = keywords)
#> # A tibble: 15 × 2
#> text keyword
#> <chr> <chr>
#> 1 I really like apples, tomoatoes, and cucumbers apple
#> 2 I really like apples, tomoatoes, and cucumbers tomato
#> 3 I really like apples, tomoatoes, and cucumbers cucumbe
#> 4 I really like apples, tomoatoes, and cucumbers chocolate
#> 5 I really like apples, tomoatoes, and cucumbers avocado
#> 6 I really like berries, chocolate, and apples apple
#> 7 I really like berries, chocolate, and apples tomato
#> 8 I really like berries, chocolate, and apples cucumbe
#> 9 I really like berries, chocolate, and apples chocolate
#> 10 I really like berries, chocolate, and apples avocado
#> 11 I really like avocado, chocolate, and carrots apple
#> 12 I really like avocado, chocolate, and carrots tomato
#> 13 I really like avocado, chocolate, and carrots cucumbe
#> 14 I really like avocado, chocolate, and carrots chocolate
#> 15 I really like avocado, chocolate, and carrots avocado
Created on 2021-09-24 by the reprex package (v2.0.1)
CodePudding user response:
Just in case TO wants to have the keywords added as separate columns:
library(tidyverse)
df %>%
mutate(keyword = list(keywords)) %>%
unnest_wider(keyword) %>%
rename_with(.cols = starts_with("..."), ~paste0("keyword_", 1:length(keywords)))
which gives:
# A tibble: 3 x 6
text keyword_1 keyword_2 keyword_3 keyword_4 keyword_5
<chr> <chr> <chr> <chr> <chr> <chr>
1 I really like apples, tomoatoes, and cucumbers apple tomato cucumbe chocolate avocado
2 I really like berries, chocolate, and apples apple tomato cucumbe chocolate avocado
3 I really like avocado, chocolate, and carrots apple tomato cucumbe chocolate avocado
CodePudding user response:
An option is also with crossing
library(dplyr)
library(tidyr)
df %>%
crossing(keywords)
-output
# A tibble: 15 x 2
text keywords
<chr> <chr>
1 I really like apples, tomoatoes, and cucumbers apple
2 I really like apples, tomoatoes, and cucumbers avocado
3 I really like apples, tomoatoes, and cucumbers chocolate
4 I really like apples, tomoatoes, and cucumbers cucumbe
5 I really like apples, tomoatoes, and cucumbers tomato
6 I really like avocado, chocolate, and carrots apple
7 I really like avocado, chocolate, and carrots avocado
8 I really like avocado, chocolate, and carrots chocolate
9 I really like avocado, chocolate, and carrots cucumbe
10 I really like avocado, chocolate, and carrots tomato
11 I really like berries, chocolate, and apples apple
12 I really like berries, chocolate, and apples avocado
13 I really like berries, chocolate, and apples chocolate
14 I really like berries, chocolate, and apples cucumbe
15 I really like berries, chocolate, and apples tomato
Regarding the OP's comment about the difference in expand_grid
and crossing
, the latter deduplicates if there duplicate elements
> expand_grid(v1 = rep(letters[1:2], each = 2), v2 = rep(letters[3:4], each = 2))
# A tibble: 16 x 2
v1 v2
<chr> <chr>
1 a c
2 a c
3 a d
4 a d
5 a c
6 a c
7 a d
8 a d
9 b c
10 b c
11 b d
12 b d
13 b c
14 b c
15 b d
16 b d
> crossing(v1 = rep(letters[1:2], each = 2), v2 = rep(letters[3:4], each = 2))
# A tibble: 4 x 2
v1 v2
<chr> <chr>
1 a c
2 a d
3 b c
4 b d