Due to the poor regex knowledge, I don't know how to remove the dot and the numbers before the dot in r with regex.
There is a short example. I have a vector a
.
a = c('1.age41_50', '2.age51_60', '3.age61_70', '4.age71_80',
'5.age1_20', '6.age21_30', '7.age31_40', '8.ageupwith65', '9.agelo65', '10.PM2_5')
a
I want to remove the dot and the numbers before the dot using str_remove()
from dplyr
package. But I don't know how to write the regex.
The final result should like this:
a_expected = c('age41_50', 'age51_60', 'age61_70', 'age71_80',
'age1_20', 'age21_30', 'age31_40', 'ageupwith65', 'agelo65', 'PM2_5')
Any help will be highly appreciated!
CodePudding user response:
library(tidyverse)
library(rebus)
#>
#> Attaching package: 'rebus'
#> The following object is masked from 'package:stringr':
#>
#> regex
#> The following object is masked from 'package:ggplot2':
#>
#> alpha
a = c('1.age41_50', '2.age51_60', '3.age61_70', '4.age71_80',
'5.age1_20', '6.age21_30', '7.age31_40', '8.ageupwith65', '9.agelo65', '10.PM2_5')
# * means zero or more
str_replace(a, START %R% DIGIT %R% '*' %R% DOT, '')
#> [1] "age41_50" "age51_60" "age61_70" "age71_80" "age1_20"
#> [6] "age21_30" "age31_40" "ageupwith65" "agelo65" "PM2_5"
#or
# means one or more
str_replace(a, '^\\d \\.', '')
#> [1] "age41_50" "age51_60" "age61_70" "age71_80" "age1_20"
#> [6] "age21_30" "age31_40" "ageupwith65" "agelo65" "PM2_5"
Created on 2021-11-18 by the reprex package (v2.0.1)
CodePudding user response:
str_remove(a, pattern = "^\\d*.")