I would like to replace some numbers in the text column of my data. The numbers are either 8 or 9 digits
and in two formats
. This is snapshot of the data:
df <- data.frame(
notes = c(
'my number is 123-41-567',
"321 12 788 is valid",
'why not taking 987-012-678',
'120 967 325 is correct'
)
)
df %>% select(notes)
notes
1 my number is 123-41-567
2 321 12 788 is valid
3 why not taking 987-012-678
4 120 967 325 is correct
I need to replace them all with a term such as aaaaa
. Hence, the data should look like:
notes
1 my number is aaaaa
2 aaaaa is valid
3 why not taking aaaaa
4 aaaaa is correct
Thank you in advance!
CodePudding user response:
Assuming the examples really do cover all possible cases (I would be careful). You can do this with the following regular expression:
\\d{3}( |-)\\d{2,3}( |-)\\d{3}
Here's the code for replacing:
library(dplyr)
library(stringr)
df %>%
mutate(
notes = str_replace_all(notes, '\\d{3}( |-)\\d{2,3}( |-)\\d{3}', 'XXXXXX')
)
notes
1 my number is XXXXXX
2 XXXXXX is valid
3 why not taking XXXXXX
4 XXXXXX is correct