I have the following vector:
cpf <- "12345678910"
The following function works:
strcapture("(\\b.{3})(.{3})(.{3})(.{2}\\b)", cpf,
proto = list(n1 = character(), n2 = character(), n3 = character(), n4 = character()))
n1 n2 n3 n4
1 123 456 789 10
But, when add the backreference \\
does not work:
strcapture("(\\b.{3})\\1\\1(.{2}\\b)", cpf,
proto = list(n1 = character(), n2 = character(), n3 = character(), n4 = character()))
CodePudding user response:
There might be a misunderstanding of what a \1
backreference means. When you include \1
in your regex pattern, it is referring to whatever was captured in the first capture group. So in the following input:
12345678910
The first capture group would be 123
. As this is never repeated anywhere else subsequently in the input, \1
will never match anything. Consider the following example, which should work with the latest pattern:
cpf <- "12312312310"
strcapture("\\b(.{3})(\\1)(\\1)(.{2})\\b", cpf,
proto = list(n1 = character(), n2 = character(), n3 = character(), n4 = character()))
In this case, the first capture group 123
repeats twice.