There is a problem that I do not know how to solve.
You need to write a function that returns all words from a string that contain repeated letters and the maximum number of their repetitions in a word.
Visually, this stage can be viewed with the following example:
"hello good home aboba"
after processing should be hello good
, and the maximum number of repetitions of a character in a given string = 2
.
The code I wrote from tries to find duplicate characters and based on this, extract words from a separate array, but something doesn't work. Help solve the problem.
library(tidyverse)
library(stringr)
text = 'tessst gfvdsvs bbbddsa daxz'
text = strsplit(text, ' ')
text
new = c()
new_2 = c()
for (i in text){
new = str_extract_all(i, '([[:alpha:]])\\1 ')
if (new != character(0)){
new_2 = c(new_2, i)
}
}
new
new_2
Output:
Error in if (new != character(0)) { : argument is of length zero
> new
[[1]]
[1] "sss"
[[2]]
character(0)
[[3]]
[1] "bbb" "dd"
[[4]]
character(0)
> new_2
NULL
CodePudding user response:
text = "hello good home aboba"
paste0(
grep("(.)\\1{1,}",
unlist(strsplit(text, " ")),
value = TRUE),
collapse = " ")
[1] "hello good"
CodePudding user response:
You can use
new <- unlist(str_extract_all(text, "\\p{L}*(\\p{L})\\1 \\p{L}*"))
i <- max(nchar( unlist(str_extract_all(new, "(.)\\1 ")) ))
With str_extract_all(text, "\\p{L}*(\\p{L})\\1 \\p{L}*")
you will extract all words containing at least two consecutive identical letters, and with max(nchar( unlist(str_extract_all(new, "(.)\\1 ")) ))
you will get the longest repeated letter chunk.
See the R demo online:
library(stringr)
text <- 'tessst gfvdsvs bbbddsa daxz'
new <- unlist(str_extract_all(text, "\\p{L}*(\\p{L})\\1 \\p{L}*"))
# => [1] "tessst" "bbbddsa"
i <- max(nchar( unlist(str_extract_all(new, "(.)\\1 ")) ))
# => [1] 3
See this regex demo. Regex details:
\p{L}*
- zero or more letters(\p{L})
- a letter captured into Group 1\1
- one or more repetitions of the captured letter\p{L}*
- zero or more letters