- My Question:
My regex pattern is: (a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)(l)(m)(n)(o)(p)(q)(r)(s)(t)(u)(v)(w)(x)(y)(z)
and My string is: abcdefghijklmnopqrstuvwxyz
the code's output is:
i_0:0 i_1:26 i_2:0 i_3:1 i_4:1 i_5:2 i_6:2 i_7:3 i_8:3 i_9:4 i_10:4 i_11:5 i_12:5 i_13:6 i_14:6 i_15:7 i_16:7 i_17:8 i_18:8 i_19:9 i_20:9 i_21:10 i_22:10 i_23:11 i_24:11 i_25:12 i_26:12 i_27:13 i_28:13 i_29:14 i_30:14 i_31:15 i_32:15 i_33:16 i_34:16 i_35:17 i_36:17 i_37:18 i_38:18 i_39:19 i_40:0 i_41:0 i_42:0 i_43:0 i_44:0 i_45:0 i_46:0 i_47:0 i_48:0 i_49:0 i_50:0 i_51:0 i_52:0 i_53:0 i_54:0 i_55:0 i_56:0 i_57:0 i_58:0 i_59:0
Question: Why PCRE regex only capture 19 groups?
- My Code
#include <pcre.h>
#include <iostream>
pcre* _rex;
pcre_extra* _rexEx;
void CompileRexStr(const std::string& rex) {
const char* errorinfo;
int errpos = 0;
_rex = NULL;
_rexEx = NULL;
_rex = pcre_compile(rex.c_str(), PCRE_UTF8, &errorinfo, &errpos, NULL);
_rexEx = pcre_study(_rex, PCRE_STUDY_JIT_COMPILE, &errorinfo);
}
int main(){
std::string rex = "(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)(l)(m)(n)(o)(p)(q)(r)(s)(t)(u)(v)(w)(x)(y)(z)";
CompileRexStr(rex);
std::string str = "abcdefghijklmnopqrstuvwxyz";
int result[60] = {0};
int cur = 0;
int pos = pcre_exec(_rex, _rexEx, str.c_str(), str.length(), cur, 0, result, 60);
for(int i=0;i < 60; i ) {
std::cout << "i_" << i << ":" << result[i] << " ";
}
return 0;
}
CodePudding user response:
It returns 19 capture groups, because you provided space to return 20 matches, and one is used for whole matching string
Captured substrings are returned to the caller via a vector of integers whose address is passed in ovector. The number of elements in the vector is passed in ovecsize, which must be a non-negative number. Note: this argument is NOT the size of ovector in bytes.
The first two-thirds of the vector is used to pass back captured substrings, each substring using a pair of integers. The remaining third of the vector is used as workspace by pcre_exec() while matching capturing subpatterns, and is not available for passing back information. The number passed in ovecsize should always be a multiple of three. If it is not, it is rounded down.
Source: Manual for PCRE
If you have 26 capture groups, you need to pass a vector containing (26 1)×3 = 81 element at least.