Home > Mobile >  Better way to bind an entire vector/list to every row of a dataframe in R?
Better way to bind an entire vector/list to every row of a dataframe in R?

Time:09-25

I want to bind an entire vector to each row of a dataframe, then unnest so I can search some text for a keyword.

Basically I want to this:

library(dplyr)
library(stringr)

df <- tibble(text = c(
  "I really like apples, tomoatoes, and cucumbers",
  "I really like berries, chocolate, and apples",
  "I really like avocado, chocolate, and carrots"
  ))

keywords <- c("apple",
              "tomato",
              "cucumbe",
              "chocolate",
              "avocado")

df %>% 
  mutate(keyword = keywords)

But that results in this error:

Error: Problem with `mutate()` input `keyword`.
x Input `keyword` can't be recycled to size 6.
i Input `keyword` is `keywords`.
i Input `keyword` must be size 6 or 1, not 5.
Run `rlang::last_error()` to see where the error occurred.

So I'm finding myself doing this intermediary step which seems unnecessary:

keywords_string <- str_c(keywords$keyword, collapse = ",")

df %>%
  mutate(keyword = keywords_string,
         keyword = str_split(keyword, ","))

Is there a way for me to do this without first making collapsing the vector to a string and then splitting the string? Thanks!

CodePudding user response:

Wrapping it in list is an easier way to get the output for the second code. But, if you're just going to unnest right after you could do it all in one step with tidyr::expand_grid

library(tidyr)

df %>% 
  expand_grid(keyword = keywords)
#> # A tibble: 15 × 2
#>    text                                           keyword  
#>    <chr>                                          <chr>    
#>  1 I really like apples, tomoatoes, and cucumbers apple    
#>  2 I really like apples, tomoatoes, and cucumbers tomato   
#>  3 I really like apples, tomoatoes, and cucumbers cucumbe  
#>  4 I really like apples, tomoatoes, and cucumbers chocolate
#>  5 I really like apples, tomoatoes, and cucumbers avocado  
#>  6 I really like berries, chocolate, and apples   apple    
#>  7 I really like berries, chocolate, and apples   tomato   
#>  8 I really like berries, chocolate, and apples   cucumbe  
#>  9 I really like berries, chocolate, and apples   chocolate
#> 10 I really like berries, chocolate, and apples   avocado  
#> 11 I really like avocado, chocolate, and carrots  apple    
#> 12 I really like avocado, chocolate, and carrots  tomato   
#> 13 I really like avocado, chocolate, and carrots  cucumbe  
#> 14 I really like avocado, chocolate, and carrots  chocolate
#> 15 I really like avocado, chocolate, and carrots  avocado

Created on 2021-09-24 by the reprex package (v2.0.1)

CodePudding user response:

Just in case TO wants to have the keywords added as separate columns:

library(tidyverse)
df %>%
  mutate(keyword = list(keywords)) %>%
  unnest_wider(keyword) %>%
  rename_with(.cols = starts_with("..."), ~paste0("keyword_", 1:length(keywords)))

which gives:

# A tibble: 3 x 6
  text                                           keyword_1 keyword_2 keyword_3 keyword_4 keyword_5
  <chr>                                          <chr>     <chr>     <chr>     <chr>     <chr>    
1 I really like apples, tomoatoes, and cucumbers apple     tomato    cucumbe   chocolate avocado  
2 I really like berries, chocolate, and apples   apple     tomato    cucumbe   chocolate avocado  
3 I really like avocado, chocolate, and carrots  apple     tomato    cucumbe   chocolate avocado 

CodePudding user response:

An option is also with crossing

library(dplyr)
library(tidyr)
df %>%
   crossing(keywords)

-output

# A tibble: 15 x 2
   text                                           keywords 
   <chr>                                          <chr>    
 1 I really like apples, tomoatoes, and cucumbers apple    
 2 I really like apples, tomoatoes, and cucumbers avocado  
 3 I really like apples, tomoatoes, and cucumbers chocolate
 4 I really like apples, tomoatoes, and cucumbers cucumbe  
 5 I really like apples, tomoatoes, and cucumbers tomato   
 6 I really like avocado, chocolate, and carrots  apple    
 7 I really like avocado, chocolate, and carrots  avocado  
 8 I really like avocado, chocolate, and carrots  chocolate
 9 I really like avocado, chocolate, and carrots  cucumbe  
10 I really like avocado, chocolate, and carrots  tomato   
11 I really like berries, chocolate, and apples   apple    
12 I really like berries, chocolate, and apples   avocado  
13 I really like berries, chocolate, and apples   chocolate
14 I really like berries, chocolate, and apples   cucumbe  
15 I really like berries, chocolate, and apples   tomato   

Regarding the OP's comment about the difference in expand_grid and crossing, the latter deduplicates if there duplicate elements

> expand_grid(v1 = rep(letters[1:2], each = 2), v2 = rep(letters[3:4], each = 2))
# A tibble: 16 x 2
   v1    v2   
   <chr> <chr>
 1 a     c    
 2 a     c    
 3 a     d    
 4 a     d    
 5 a     c    
 6 a     c    
 7 a     d    
 8 a     d    
 9 b     c    
10 b     c    
11 b     d    
12 b     d    
13 b     c    
14 b     c    
15 b     d    
16 b     d    
> crossing(v1 = rep(letters[1:2], each = 2), v2 = rep(letters[3:4], each = 2))
# A tibble: 4 x 2
  v1    v2   
  <chr> <chr>
1 a     c    
2 a     d    
3 b     c    
4 b     d    
  • Related