How to find words in a string that have consecutive letters in R language-CodePudding

There is a problem that I do not know how to solve.

You need to write a function that returns all words from a string that contain repeated letters and the maximum number of their repetitions in a word.

Visually, this stage can be viewed with the following example: "hello good home aboba" after processing should be hello good, and the maximum number of repetitions of a character in a given string = 2.

The code I wrote from tries to find duplicate characters and based on this, extract words from a separate array, but something doesn't work. Help solve the problem.

library(tidyverse)
library(stringr)   

text = 'tessst gfvdsvs bbbddsa daxz'
text = strsplit(text, ' ')
text

new = c()
new_2 = c()

for (i in text){
  
  new = str_extract_all(i, '([[:alpha:]])\\1 ')
  if (new != character(0)){
    new_2 = c(new_2, i)
  }
}
new
new_2

Output:

Error in if (new != character(0)) { : argument is of length zero
> new
[[1]]
[1] "sss"

[[2]]
character(0)

[[3]]
[1] "bbb" "dd" 

[[4]]
character(0)

> new_2
NULL

CodePudding user response：

text = "hello good home aboba"

paste0(
  grep("(.)\\1{1,}", 
       unlist(strsplit(text, " ")), 
       value = TRUE),
  collapse = " ")

[1] "hello good"

CodePudding user response：

You can use

new <- unlist(str_extract_all(text, "\\p{L}*(\\p{L})\\1 \\p{L}*"))
i <- max(nchar( unlist(str_extract_all(new, "(.)\\1 ")) ))

With str_extract_all(text, "\\p{L}*(\\p{L})\\1 \\p{L}*") you will extract all words containing at least two consecutive identical letters, and with max(nchar( unlist(str_extract_all(new, "(.)\\1 ")) )) you will get the longest repeated letter chunk.

See the R demo online:

library(stringr)
text <- 'tessst gfvdsvs bbbddsa daxz'
new <- unlist(str_extract_all(text, "\\p{L}*(\\p{L})\\1 \\p{L}*"))
# => [1] "tessst"  "bbbddsa"
i <- max(nchar( unlist(str_extract_all(new, "(.)\\1 ")) ))
# => [1] 3

See this regex demo. Regex details:

\p{L}* - zero or more letters
(\p{L}) - a letter captured into Group 1
\1 - one or more repetitions of the captured letter
\p{L}* - zero or more letters