I am currently writing a project, where I have to remove part of a string before and after. I have a attached an example in the bottom, and I am only able to use the packages stringr, tidyverse and dplyr. The different examples have different length, but I only need to keep the "r1" part or "r2". There is r1-4 for 96 different examples. Is anybody able to help me only keep this part af the variable. So I have a variable only containing of the r1, r2, r3 and r4. Thanks in advantage.
[19] "data/r1-23-8-312.json" "data/r1-23-8-66.json" "data/r1-23-8-68.json"
[22] "data/r1-23-8-85.json" "data/r1-23-8-88.json" "data/r2-65-12-200.json"
[25] "data/r2-65-12-202.json" "data/r2-65-12-214.json" "data/r2-65-12-215.json"
class(dat2$route)
[1] "character"
I have figured out, I can use "substr(dat2$route, 6, 7)", but if I use it this way:
dat2 <- substr(dat2$route, 6, 7)
It removes all the other variables beside route, how is that? Got 11 other variables as well.
CodePudding user response:
There are several ways. If your character always starts with data/ you can do
library(tidyverse)
dat2 %>%
mutate(new_route = str_sub(route, start = 6L, end = 7L))
Other options are to extract the 'r' followed by a number or to remove the data/ part and the stuff after the rX part. Plenty of options.
CodePudding user response:
If we want to be more strict we can use stringr::str_match()
to capture an r followed by 1 to 4 between / and -.
The first column of matching
will contain the whole match and the second the capture made by surrounding the pattern with parenthesis.
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
data <-
c("data/r1-23-8-312.json", "data/r1-23-8-66.json", "data/r1-23-8-68.json",
"data/r1-23-8-85.json", "data/r1-23-8-88.json", "data/r2-65-12-200.json" ,
"data/r2-65-12-202.json", "data/r2-65-12-214.json" , "data/r2-65-12-215.json")
(matching <- stringr::str_match(data, '/(r[1-4])-'))
#> [,1] [,2]
#> [1,] "/r1-" "r1"
#> [2,] "/r1-" "r1"
#> [3,] "/r1-" "r1"
#> [4,] "/r1-" "r1"
#> [5,] "/r1-" "r1"
#> [6,] "/r2-" "r2"
#> [7,] "/r2-" "r2"
#> [8,] "/r2-" "r2"
#> [9,] "/r2-" "r2"
matching[, 2]
#> [1] "r1" "r1" "r1" "r1" "r1" "r2" "r2" "r2" "r2"
Created on 2021-12-07 by the reprex package (v2.0.1)
CodePudding user response:
library(stringr)
str_extract(dat2$route, pattern = "r[0-9]")