Home > Software design >  How to extract country codes from phone number?
How to extract country codes from phone number?

Time:11-12

I'm trying to extract the country codes and move them into a new column.

Example data

data <- data.frame(phone = c(" 1 800 000 000", " 257000000000", " 91-00 000 00", "200000 000"))

I only have a start so far. For instance, I can extract the sign, but I'm trying to find how to detect 1 257 91, etc..

data |> 
  mutate(country_code = str_extract(phone, "[:symbol:]"))
phone            country_code
 1 800 000 000                  
 257000000000                   
 91-00 000 00                   
200000 000          NA

What I'm trying to achieve:

phone            country_code
 1 800 000 000       1          
 257000000000        257            
 91-00 000 00        91         
200000 000           NA

I'm wondering if I can match possible country codes based on another vector where I specify the different variations, like this: codes <- c(1, 257, 91) or like this codes <- c(" 1", " 257", " 91").

CodePudding user response:

Does this work:

library(dplyr)
library(stringr)

data %>% mutate(country_code = str_extract(phone, str_c('\\ ', codes, collapse = '|')))
           phone country_code
1  1 800 000 000            1
2   257000000000          257
3   91-00 000 00           91
4     200000 000         <NA>

CodePudding user response:

Since is a special character, you have to add \\ to escape it. You can try searching for any of your pre-designated codes by first concatenating all of them using the "or " symbol (|) then using the stringr package's str_match:

srch <- paste0("\\",paste(codes, collapse = "|\\"))
# [1] "\\ 1|\\ 257|\\ 91"

stringr::str_match(data$phone, srch)

Output:

     [,1]  
[1,] " 1"  
[2,] " 257"
[3,] " 91" 
[4,] NA 

Data

data <- data.frame(phone = c(" 1 800 000 000", " 257000000000", " 91-00 000 00", "200000 000"))
codes <- c(" 1", " 257", " 91")

CodePudding user response:

Using base R

pat <- sprintf("\\ (%s)", paste(codes, collapse = "|"))
i1 <- grepl(pat, data$phone)
data$country_code[i1] <-  regmatches(data$phone[i1], regexpr(pat, data$phone[i1]))

-output

> data
           phone country_code
1  1 800 000 000            1
2   257000000000          257
3   91-00 000 00           91
4     200000 000         <NA>
  • Related