Home > database >  Remove one number at position n of the number in a string of numbers separated by slashes
Remove one number at position n of the number in a string of numbers separated by slashes

Time:03-04

I have a character column with this configuration:

data <- data.frame(
  id = 1:3,
  codes = c("08001301001", "08002401002 / 08002601003 / 17134604034", "08004701005 / 08005101001"))

I want to remove the 6th digit of any number within the string. The numbers are always 10 characters long.

My code works. However I believe it might be done easier using RegEx, but I couldn't figure it out.

library(stringr)

remove_6_digit <- function(x){
  idxs <- str_locate_all(x,"/")[[1]][,1]
  
  for (idx in c(rev(idxs 7), 6)){
      str_sub(x, idx, idx) <- ""      
  }
  return(x)
}

result <- sapply(data$codes, remove_6_digit, USE.NAMES = F)

CodePudding user response:

You can use

gsub("\\b(\\d{5})\\d", "\\1", data$codes)

See the regex demo. This will remove the 6th digit from the start of a digit sequence.

Details:

  • \b - word boundary
  • (\d{5}) - Capturing group 1 (\1): five digits
  • \d - a digit.

While word boundary looks enough for the current scenario, a digit boundary is also an option in case the numbers are glued to word chars:

gsub("(?<!\\d)(\\d{5})\\d", "\\1", data$codes, perl=TRUE)

where perl=TRUE enables the PCRE regex syntax and (?<!\d) is a negative lookbehind that fails the match if there is a digit immediately to the left of the current location.

And if you must only change numeric char sequences of 10 digits (no shorter and no longer) you can use

gsub("\\b(\\d{5})\\d(\\d{4})\\b", "\\1\\2", data$codes)
gsub("(?<!\\d)(\\d{5})\\d(?=\\d{4}(?!\\d))", "\\1", data$codes, perl=TRUE)

One remark though: your numbers consist of 11 digits, so you need to replace \\d{4} with \\d{5}, see this regex demo.

CodePudding user response:

Another possible solution, using stringr::str_replace_all and lookaround :

library(tidyverse)

data %>% 
  mutate(codes = str_replace_all(codes, "(?<=\\d{5})\\d(?=\\d{5})", ""))

#>   id                                codes
#> 1  1                           0800101001
#> 2  2 0800201002 / 0800201003 / 1713404034
#> 3  3              0800401005 / 0800501001
  • Related