Home > front end >  In dplyr using str_detect and case_when in R
In dplyr using str_detect and case_when in R

Time:02-27

This is my df:

mydf <- structure(list(Action = c("Passes accurate", "Passes accurate", 
"Passes accurate", "Passes accurate", "Lost balls", "Lost balls (in opp. half)", 
"Passes (inaccurate)", "Interceptions (in opp. half)", "Interceptions", 
"Positional attacks")), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

I have this vector: passes <- c('Passes','passes','Assists','Crosses')

I am trying to do this: mydf %>% mutate(newcol = case_when(str_detect(Action, passes) ~ 'passes'))

But I only have the frst row filled with passes. I should have for example the first 4 rows filled with passes. Also the 7th row. How can I achieve this with case_when function?

CodePudding user response:

You can easily do:

library(tidyverse)
mydf %>%
  mutate(newcol = if_else(str_detect(Action, paste0(passes, collapse = '|')), 'passes', NA_character_))

# A tibble: 10 x 2
   Action                       newcol
   <chr>                        <chr> 
 1 Passes accurate              passes
 2 Passes accurate              passes
 3 Passes accurate              passes
 4 Passes accurate              passes
 5 Lost balls                   <NA>  
 6 Lost balls (in opp. half)    <NA>  
 7 Passes (inaccurate)          passes
 8 Interceptions (in opp. half) <NA>  
 9 Interceptions                <NA>  
10 Positional attacks           <NA>  

CodePudding user response:

I used str_sub() for this.

mydf %>% mutate(newcol = case_when(str_sub(Action,1,6) == 'Passes' ~ "passes"))

print(mydf)
   Action                       newcol
   <chr>                        <chr> 
 1 Passes accurate              passes
 2 Passes accurate              passes
 3 Passes accurate              passes
 4 Passes accurate              passes
 5 Lost balls                   NA    
 6 Lost balls (in opp. half)    NA    
 7 Passes (inaccurate)          passes
 8 Interceptions (in opp. half) NA    
 9 Interceptions                NA    
10 Positional attacks           NA 

CodePudding user response:

An option is also to use fuzzyjoin

library(fuzzyjoin)
library(dplyr)
regex_left_join(mydf, tibble(passes, newcol = "passes"),
    by = c("Action" = "passes")) %>%
   select(-passes)

-output

# A tibble: 10 × 2
   Action                       newcol
   <chr>                        <chr> 
 1 Passes accurate              passes
 2 Passes accurate              passes
 3 Passes accurate              passes
 4 Passes accurate              passes
 5 Lost balls                   <NA>  
 6 Lost balls (in opp. half)    <NA>  
 7 Passes (inaccurate)          passes
 8 Interceptions (in opp. half) <NA>  
 9 Interceptions                <NA>  
10 Positional attacks           <NA>  
  • Related