I am trying to build a regular expression, to match a list of numbers inside an html tag.
The tag is data-p="and the content is here".
Inside the content there is a list of numbers formatted like this:
["16834091899728893939","8871244709062187521","3716487480481705970","1266937738203421917"]
I would like the regex to return the list of digits: [16834091899728893939, 8871244709062187521, 3716487480481705970, 1266937738203421917]
Is it possible to match a list inside an already matched group?
It is easy to match content of all data-p's tags from the whole page: "data-p="(.*?)"", but I cannot get the list f numbers from inside.
Is it possible to do al in one regex?
Thanks !
Full html below
data-p="%[email protected],null,null,null,null,null,[8,null,["oyo rooms gurgaon",[null,null,null,"INR",[[2023,1,27],[2023,1,28],1,null,0],null,[],[],null,null,null,null,null,null,null,null,null,null,null,null,[[[353,null,true],[]]],null,null,null,[],null,null,null,[],null,null,null,null,[],null,null,null,null,null,null,null,null,null,null,[]],[null,["16834091899728893939","8871244709062187521","3716487480481705970","1266937738203421917"],null,null,null,null,1,1,3,null,null,null,null,null,[],null,null,null,null,"Gurugram, Haryana",null,null,null,null,null,null,null,null,null,null,null,null,"oyo rooms gurgaon",null,[false]],0,null,null,0,null,false,null,null,false,null,null,null,null,null,[[[1],[3,[null,true]],[5,[null,true]],[4,[null,true]],[6],[7],[8]],false]],null,null,null,null,null,2]]"
CodePudding user response:
It depends on which regex engine you are using. For example, using the PCRE engine you can construct the following regex:
(?:data-p="[^"]*?\[|\G,)"(\d )"
Here is the demo.
This expression match a "(\d )"
string under two conditions: it should either be preceded by data-p="[^"]*?\[
pattern, or it should be preceded by \G,
pattern. The first pattern is obvious. The second one includes \G
to match the position of the previous match. This disables matching the "(\d )"
after every comma. In the demo above it disables matching of the string 456
in the other-tag
.