Home > Software design >  How to remove the dot and the numbers before the dot in r with regex
How to remove the dot and the numbers before the dot in r with regex

Time:11-19

Due to the poor regex knowledge, I don't know how to remove the dot and the numbers before the dot in r with regex.

There is a short example. I have a vector a.

a = c('1.age41_50', '2.age51_60', '3.age61_70', '4.age71_80',
      '5.age1_20', '6.age21_30', '7.age31_40', '8.ageupwith65', '9.agelo65', '10.PM2_5')
a

I want to remove the dot and the numbers before the dot using str_remove() from dplyr package. But I don't know how to write the regex.

The final result should like this:

a_expected = c('age41_50', 'age51_60', 'age61_70', 'age71_80',
      'age1_20', 'age21_30', 'age31_40', 'ageupwith65', 'agelo65', 'PM2_5')

Any help will be highly appreciated!

CodePudding user response:

library(tidyverse)
library(rebus)
#> 
#> Attaching package: 'rebus'
#> The following object is masked from 'package:stringr':
#> 
#>     regex
#> The following object is masked from 'package:ggplot2':
#> 
#>     alpha

a = c('1.age41_50', '2.age51_60', '3.age61_70', '4.age71_80',
      '5.age1_20', '6.age21_30', '7.age31_40', '8.ageupwith65', '9.agelo65', '10.PM2_5')

# * means zero or more

str_replace(a, START %R% DIGIT %R% '*' %R% DOT, '') 
#>  [1] "age41_50"    "age51_60"    "age61_70"    "age71_80"    "age1_20"    
#>  [6] "age21_30"    "age31_40"    "ageupwith65" "agelo65"     "PM2_5"


#or


#   means one or more
str_replace(a, '^\\d \\.', '')
#>  [1] "age41_50"    "age51_60"    "age61_70"    "age71_80"    "age1_20"    
#>  [6] "age21_30"    "age31_40"    "ageupwith65" "agelo65"     "PM2_5"

Created on 2021-11-18 by the reprex package (v2.0.1)

CodePudding user response:

str_remove(a, pattern = "^\\d*.")
  • Related