I am using a regex that is suggested here to repleace any type of phone numbers with aaaaaaaaaa
.
This a snapshot of my data :
df <- data.frame(
text = c(
'my number is (123)-416-567',
"1 321 124 7889 is valid",
'why not taking 987-012-6782',
'120 967 3256 is correct',
'call at 888 969 9919',
'please text at 1 647 989 1213'
)
)
df %>% select(text)
text
1 my number is (123)-416-567
2 1 321 124 7889 is valid
3 why not taking 987-012-6782
4 120 967 3256 is correct
5 call at 888 969 9919
6 please text at 1 647 989 1213
My code is
df %>%
mutate(
text = str_replace_all(text, '^(\ \d{1,2}\s)?\(?\d{3}\)?[\s.-]\d{3}[\s.-]\d{4}$', 'aaaaaaaaaa')
)
and I get this error
Error: '\ ' is an unrecognized escape in character string starting "'^(\ "
Error: unexpected ')' in " )"
The outcome should be like
text
1 my number is aaaaaaaaaa
2 aaaaaaaaaa is valid
3 why not taking aaaaaaaaaa
4 aaaaaaaaaa is correct
5 call at aaaaaaaaaa
6 please text at aaaaaaaaaa
CodePudding user response:
You can use
str_replace_all(text, '(?:\\ ?\\d{1,2}\\s)?\\(?\\d{3}\\)?[\\s.-]\\d{3}[\\s.-]\\d{3,4}(?!\\d)', 'aaaaaaaaaa')
See the regex demo.
Details:
(?:\ ?\d{1,2}\s)?
- an optional sequence of an optional\(?
- an optional(
\d{3}
- three digits\)?
- an optional)
[\s.-]
- a-
,.
or whitespace\d{3}
- three digits[\s.-]
- a-
,.
or whitespace\d{3,4}
- three or four digits(?!\d)
- no digit alowed right after.
Notes:
- In a string literal, a backslash is defined with double
\
char ^
and$
match start/end of string so in this case, it makes sense to remove the^
anchor, and replace$
with a right-digit boundary- The last
\d{3}
did not match numbers where the last part contained four digits, so I replaced it with\d{3,4}
.