Ignore any number containing more than 3 repeated numbers-CodePudding

Suppose i have following id

74876593476
74877777777
74884784633
74822228765
74878645421
74820201111

i want to ignore any number contain more than 3 repeated numbers respectively, then the expected result is:

74876593476
74884784633
74878645421
74876593476

CodePudding user response：

Using the regex from this post, you may use grep -

x <- c(74876593476, 74877777777, 74884784633, 74822228765, 74878645421, 74820201111)

grep('(\\d)\\1\\1\\1', x, invert = TRUE, value = TRUE)
#[1] "74876593476" "74884784633" "74878645421"

Or if you are a tidyverse fan, you can use str_subset from stringr with the same regex.

stringr::str_subset(x, '(\\d)\\1\\1\\1', negate = TRUE)
#[1] "74876593476" "74884784633" "74878645421"

This will remove numbers that occur more than 3 consecutive times.

CodePudding user response：

We can try grepl to subset x

> x <- c(74876593476, 74877777777, 74884784633, 74822228765, 74878645421, 74820201111)

> x[!grepl("(\\d)\\1{3}", x)]
[1] 74876593476 74884784633 74878645421

CodePudding user response：

The regex answer is probably better, but here's an alternative using strsplit and rle.

x <- c(74876593476, 74877777777, 74884784633, 74822228765, 74878645421, 74820201111)

x[sapply(strsplit(as.character(x),""),\(x)!any(rle(x)$lengths>3))]

#[1] 74876593476 74884784633 74878645421

CodePudding user response：

A solution that avoids converting to characters.

fNoRep <- function(x, k = 3L) {
  n <- ceiling(log10(x))   1L
  # get the digits as integers, plus an extra digit for each value
  i <- as.integer((rep.int(x, n)/10^sequence(n, 0))%)
  # set the extra digit to 10 in order to separate the values
  i[cs <- cumsum(n)] <- 10L
  # use rle to find runs longer than k
  lens <- rle(i)$lengths
  x[-unique(findInterval(cumsum(lens)[which(lens > k)], cs)) - 1L]
}

x <- c(74876593476, 74877777777, 74884784633, 74822228765, 74878645421, 74820201111, 91526000000)
fNoRep(x)
#> [1] 74876593476 74884784633 74878645421

Compare to the grep solution, which doesn't remove values with trailing zeros.

fNoRepGrep <- function(x, k = 3L) as.numeric(grep(sprintf("(\\d)\\1{%d}", k), x, invert = TRUE, value = TRUE))
fNoRepGrep(x)
#> [1] 74876593476 74884784633 74878645421 91526000000

The math-based solution is about twice as fast as the grep solution.

x <- sample(1e10:(1e11 - 1), 1e4)
microbenchmark::microbenchmark(math = fNoRep(x),
                               grep = fNoRepGrep(x))
#> Unit: milliseconds
#>  expr     min       lq     mean   median       uq     max neval
#>  math  7.5738  9.03255 10.30973  9.38525 11.81905 16.7631   100
#>  grep 19.9207 20.19140 20.67160 20.48535 20.94270 23.1786   100

CodePudding user response：

Convert them to strings, use the grep() function to detect the triple repeated digits then filter out any entries where it's contained. Finally, use filter from dplyr to remove all cases where the triple digits were matched. The use of | in the call to grepl() allows multiple valid strings to be used

library(tidyverse)
library(rlang)

data <- tibble(id=as.character(c(74876593476,74877777777,11111,74884784633,74822228765,74878645421,74820201111)))


output <-  data %>% mutate(triple=grepl(x=id,pattern="111|222|777")) %>%
       filter(triple==FALSE)