Home > Back-end >  How to use regex with the same prefix but different suffix?
How to use regex with the same prefix but different suffix?


Let's say that My data has the following structure:

Data<-structure(list(Date = structure(c(17955, 17955, 17954, 17954, 
17953), class = "Date"), name = c("QLD to SA", "QLD.NSW to SA.NSW", 
"QLD to SA", "QLD.NSW to SA.NSW", "QLD to SA"), value = c(-2.33611657245688, 
-1.48768446629906, -2.36699803453011, -1.46423011205677, -2.32284554692339
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"

I want to create a new column called Group. This column depends on the regular expressions in my column name. What I want is an ouput like this:

 Date       name              value Group
2019-02-28 QLD to SA         -2.34 QLD  
2019-02-28 QLD.NSW to SA.NSW -1.49 QLD-NSW  
2019-02-27 QLD to SA         -2.37 QLD  
2019-02-27 QLD.NSW to SA.NSW -1.46 QLD-NSW  
2019-02-26 QLD to SA         -2.32 QLD  

I think that something like this could function:

    str_detect(name, regex("QLD", ignore_case=TRUE)) ~ "QLD",
    str_detect(name, regex("^QLD.NSW", ignore_case=TRUE)) ~ "QLD-NSW",
                         T ~ "number")) 

It fails because the column group recognize only the first case QLD and stop it for the second QLD-NSW

CodePudding user response:

You just need to swap the two lines of code starting with str_detect.

Please find below a reprex.


  • Code

  str_detect(name, regex("^QLD\\.NSW", ignore_case=TRUE)) ~ "QLD-NSW",
  str_detect(name, regex("QLD", ignore_case=TRUE)) ~ "QLD",
  T ~ "number")) 
  • Output
#> # A tibble: 5 x 4
#>   Date       name              value Group  
#>   <date>     <chr>             <dbl> <chr>  
#> 1 2019-02-28 QLD to SA         -2.34 QLD    
#> 2 2019-02-28 QLD.NSW to SA.NSW -1.49 QLD-NSW
#> 3 2019-02-27 QLD to SA         -2.37 QLD    
#> 4 2019-02-27 QLD.NSW to SA.NSW -1.46 QLD-NSW
#> 5 2019-02-26 QLD to SA         -2.32 QLD

Created on 2022-03-22 by the reprex package (v2.0.1)

  • Related