I'm trying to extract the country codes and move them into a new column.
Example data
data <- data.frame(phone = c(" 1 800 000 000", " 257000000000", " 91-00 000 00", "200000 000"))
I only have a start so far. For instance, I can extract the
sign, but I'm trying to find how to detect 1 257 91
, etc..
data |>
mutate(country_code = str_extract(phone, "[:symbol:]"))
phone country_code
1 800 000 000
257000000000
91-00 000 00
200000 000 NA
What I'm trying to achieve:
phone country_code
1 800 000 000 1
257000000000 257
91-00 000 00 91
200000 000 NA
I'm wondering if I can match possible country codes based on another vector where I specify the different variations, like this: codes <- c(1, 257, 91)
or like this codes <- c(" 1", " 257", " 91")
.
CodePudding user response:
Does this work:
library(dplyr)
library(stringr)
data %>% mutate(country_code = str_extract(phone, str_c('\\ ', codes, collapse = '|')))
phone country_code
1 1 800 000 000 1
2 257000000000 257
3 91-00 000 00 91
4 200000 000 <NA>
CodePudding user response:
Since
is a special character, you have to add \\
to escape it. You can try searching for any of your pre-designated codes by first concatenating all of them using the "or " symbol (|
) then using the stringr
package's str_match
:
srch <- paste0("\\",paste(codes, collapse = "|\\"))
# [1] "\\ 1|\\ 257|\\ 91"
stringr::str_match(data$phone, srch)
Output:
[,1]
[1,] " 1"
[2,] " 257"
[3,] " 91"
[4,] NA
Data
data <- data.frame(phone = c(" 1 800 000 000", " 257000000000", " 91-00 000 00", "200000 000"))
codes <- c(" 1", " 257", " 91")
CodePudding user response:
Using base R
pat <- sprintf("\\ (%s)", paste(codes, collapse = "|"))
i1 <- grepl(pat, data$phone)
data$country_code[i1] <- regmatches(data$phone[i1], regexpr(pat, data$phone[i1]))
-output
> data
phone country_code
1 1 800 000 000 1
2 257000000000 257
3 91-00 000 00 91
4 200000 000 <NA>