This question is related to RegEx: Grabbing values between quotation marks, that I've tried to implement in my actual code, but with no success.
What I'd like to accomplish is to parse PHP code, and grab literal double-quoted strings inside the code to automatically fix wrong/bad/unsecure things.
Solutions using token_get_all()
are not valid, as the PHP code may be not parsing correctly (invalid, broken, old PHP 4 code).
The regular expression should:
- Match only if a double-quote is not preceeded by a single quote
- Match only if a double-quote is not followed by a single quote
- Also match backslashes inside the double-quoted string
- Leave the start and trailing double quoted untouched (return it as part of the match)
To have an example of what the regexp should match, consider this parts of (ugly, old and unsecure) PHP code:
header("Last-Modified: ".gmdate("D, d M Y H:i:s")." GMT");
$sql = "UPDATE $table_name SET
password = password('$newpass'), pchange = '1'
WHERE email = '$email'";
$var = '"' . $something . '"';
$msg = "<p><a href=\"login.html\">Login</a></p>";
echo "<label for=\"whatever\">LABEL</label><select class='".$style."'>";
The regular expression should match:
"Last-Modified: "
"D, d M Y H:i:s"
" GMT"
"UPDATE $table_name SET password = password('$newpass'), pchange = '1' WHERE email = '$email'"
"<p><a href=\"login.html\">Login</a></p>"
"<label for=\"whatever\">LABEL</label><select class='"
"'>"
The regexp will be used within a preg_match()
with PREG_OFFSET_CAPTURE
, to restart the search where the last match occurred, in this way:
$string_match = preg_match(**REGEXP_HERE**, $php_code, $text_in_double_quotes, PREG_OFFSET_CAPTURE, $last_pos);
if ($string_match) {
list($text_in_double_quotes, $last_pos) = $text_in_double_quotes[0];
}
Thank you!
P.S.
For those asking why I'm bothering doing this, here's a Working demo with the Regular Expression suggested by @bobblebubble that shows exactly why I'm looking for such a particular regex (and why I can't use preg_match_all in this case)
CodePudding user response:
You could use verbs (*SKIP)(*F)
to exclude single quoted substrings.
$regex = '/\'[^\'\\\]*(?:\\\.[^\'\\\]*)*\'(*SKIP)(?!)|"[^"\\\]*(?:\\\.[^"\\\]*)*"/';
See this demo at regex101 - The underlying pattern is from this answer.
To extract multiple items, use this regex with preg_match_all
like that:
if(preg_match_all($regex, $str, $out) > 0) {
print_r($out[0]);
}
Here is a PHP demo at tio.run, matches will be in $out[0]
(full pattern).