Remove a specific part of a string in R with stringr-CodePudding

I am currently writing a project, where I have to remove part of a string before and after. I have a attached an example in the bottom, and I am only able to use the packages stringr, tidyverse and dplyr. The different examples have different length, but I only need to keep the "r1" part or "r2". There is r1-4 for 96 different examples. Is anybody able to help me only keep this part af the variable. So I have a variable only containing of the r1, r2, r3 and r4. Thanks in advantage.

[19] "data/r1-23-8-312.json"    "data/r1-23-8-66.json"     "data/r1-23-8-68.json"    
[22] "data/r1-23-8-85.json"     "data/r1-23-8-88.json"     "data/r2-65-12-200.json"  
[25] "data/r2-65-12-202.json"   "data/r2-65-12-214.json"   "data/r2-65-12-215.json"  

class(dat2$route)
[1] "character"

I have figured out, I can use "substr(dat2$route, 6, 7)", but if I use it this way:

dat2 <- substr(dat2$route, 6, 7)

It removes all the other variables beside route, how is that? Got 11 other variables as well.

CodePudding user response：

There are several ways. If your character always starts with data/ you can do

library(tidyverse)
dat2 %>%
  mutate(new_route = str_sub(route, start = 6L, end = 7L))

Other options are to extract the 'r' followed by a number or to remove the data/ part and the stuff after the rX part. Plenty of options.

CodePudding user response：

If we want to be more strict we can use stringr::str_match() to capture an r followed by 1 to 4 between / and -.

The first column of matching will contain the whole match and the second the capture made by surrounding the pattern with parenthesis.

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

data <- 
c("data/r1-23-8-312.json",    "data/r1-23-8-66.json",     "data/r1-23-8-68.json",    
"data/r1-23-8-85.json",     "data/r1-23-8-88.json",     "data/r2-65-12-200.json" , 
"data/r2-65-12-202.json",   "data/r2-65-12-214.json" ,  "data/r2-65-12-215.json") 

(matching <- stringr::str_match(data, '/(r[1-4])-'))
#>       [,1]   [,2]
#>  [1,] "/r1-" "r1"
#>  [2,] "/r1-" "r1"
#>  [3,] "/r1-" "r1"
#>  [4,] "/r1-" "r1"
#>  [5,] "/r1-" "r1"
#>  [6,] "/r2-" "r2"
#>  [7,] "/r2-" "r2"
#>  [8,] "/r2-" "r2"
#>  [9,] "/r2-" "r2"

matching[, 2]
#> [1] "r1" "r1" "r1" "r1" "r1" "r2" "r2" "r2" "r2"

^{Created on 2021-12-07 by the reprex package (v2.0.1)}

CodePudding user response：

library(stringr)
str_extract(dat2$route, pattern = "r[0-9]")