Home > front end >  backreference into strcapture regex
backreference into strcapture regex

Time:12-14

I have the following vector:

cpf <- "12345678910"

The following function works:

strcapture("(\\b.{3})(.{3})(.{3})(.{2}\\b)", cpf, 
       proto = list(n1 = character(), n2 = character(), n3 = character(), n4 = character()))

   n1  n2  n3 n4
1 123 456 789 10

But, when add the backreference \\ does not work:

strcapture("(\\b.{3})\\1\\1(.{2}\\b)", cpf, 
       proto = list(n1 = character(), n2 = character(), n3 = character(), n4 = character()))

CodePudding user response:

There might be a misunderstanding of what a \1 backreference means. When you include \1 in your regex pattern, it is referring to whatever was captured in the first capture group. So in the following input:

12345678910

The first capture group would be 123. As this is never repeated anywhere else subsequently in the input, \1 will never match anything. Consider the following example, which should work with the latest pattern:

cpf <- "12312312310"
strcapture("\\b(.{3})(\\1)(\\1)(.{2})\\b", cpf, 
    proto = list(n1 = character(), n2 = character(), n3 = character(), n4 = character()))

In this case, the first capture group 123 repeats twice.

  •  Tags:  
  • r
  • Related