I need to implement a preg_replace to fix some warnings that I have on an huge amount of scripts.
My goal is to replace statements like...
$variable[key] = "WhatElse";
$result = $wso->RSLA("7050", $vegalot, "600", "WFID_OK_WEB","1300", $_POST[username]);
if ($result[ECD] != 0) {
if ($line=="AAAA" && in_array(substr($wso->lot,0,7),$lot_aaaa_list) && $lot[wafer][25]) {
... with same statements having CONSTANTS replaced by ARRAY KEYS ...
$variable['key'] = "WhatElse";
$result = $wso->RSLA("7050", $vegalot, "600", "WFID_OK_WEB","1300", $_POST['username']);
if ($result['ECD'] != 0) {
if ($line=="AAAA" && in_array(substr($wso->lot,0,7),$lot_aaaa_list) && $lot[wafer][25]) {
but excluding cases when the array variable is declared within a string, ie...
$output = "<input name='variable[key]' has to be preserved as it is.";
$output = 'Even this string variable[key] has to be preserved as it is.';
...because they would be replaced (but this not not what I want) into:
$output = "<input name='variable['key']' has to be preserved as it is.";
$output = 'Even this string variable['key'] has to be preserved as it is.';
Every statements is identified by a ''preg_match_all'' statement and then replaced with a ''str_replace'':
preg_match_all('/(\[(\w*)\])/', $str, $matches, PREG_SET_ORDER, 0);
$replace_str = $str;
$local_changeflag = false;
foreach($matches as $m) {
if (!$m[2]) continue;
if (is_numeric($m[2])) continue;
$replace_str = str_replace($m[1], "['" . $m[2] . "']", $replace_str);
$local_changeflag = true;
}
Do you have any suggestion to better solve such issue that I have?
CodePudding user response:
[I know this isn't regexp, but since you asked for 'suggestion to better solve such issue' I give you my 2 cents]
How about simply parsing the code ;):
$source = file_get_contents('/tmp/test.php'); // Change this
$tokens = token_get_all($source);
$history = [];
foreach ($tokens as $token) {
if (is_string($token)) { // simple 1-character token
array_push($history, str_replace(["\r", "\n"], '', $token));
$history = array_slice($history, -2);
echo $token;
} else {
list($id, $text) = $token;
switch ($id) {
case T_STRING:
if ($history == [T_VARIABLE, '[']) {
// Token sequence is [T_VARIABLE, '[', T_STRING]
echo "'$text'";
}
else {
echo $text;
}
break;
default:
// anything else -> output "as is"
echo $text;
break;
}
array_push($history, $id);
$history = array_slice($history, -2);
}
}
Of course, the $source
needs to be changed to whatever suits you. token_get_all()
then loads the PHP code and parses it into a list of tokens. That list is then processed item by item and possibly changed before being output again, according to our needs.
1-char tokens like [
("[" and "]" in f.ex $myVariable[1]
both get to be tokens) are a special case which has to be handled in the loop. Otherwise $token is an array with an ID for the type of token and the token itself.
"Unfortunately" T_STRING
is kind of a general case, so to pinpoint only the strings being used as constants in array indexing we store the 2 items preceding the current in $history
. ("$myVariable" and "[")
..and..that's it, really. The code is read from a file, processed and output to stdout. Everything but the "constants as array index" case should be output as is.
If you like I can rewrite it as a function or something. The above should be kind of the general solution, though.
CodePudding user response:
If you want to wrap any valid identifiers inside square brackets, you can use preg_replace
directly:
$regex = '/(["\'])(?:(?=(\\\\?))\2.)*?\1(*SKIP)(*F)|\[([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)]/s';
$ouptut = preg_replace($regex, '$3', $text);
See the regex demo. Details:
(["'])(?:(?=(\\?))\2.)*?\1
- matches a string between single or double quotation marks (contains two capturing groups)(*SKIP)(*F)
- discards the matched text and fails the match starting a new search from the failure location|
- or\[
-[
char([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)
- Group 3: a letter, underscore, or any char from the\x7f-\xff
range and then any alphanumeric, underscore or any char from the\x7f-\xff
range]
- a]
char.
See the PHP demo:
$regex = '/(["\'])(?:(?=(\\\\?))\2.)*?\1(*SKIP)(*F)|\[([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)]/s';
$str = '$output = "<input name=\'variable[key]\' has to be preserved as it is.";
$output = \'Even this string variable[key] has to be preserved as it is.\';
$variable[key] = "WhatElse";
$result = $wso->RSLA("7050", $vegalot, "600", "WFID_OK_WEB","1300", $_POST[username]);
if ($result[ECD] != 0) {
if ($line=="AAAA" && in_array(substr($wso->lot,0,7),$lot_aaaa_list) && $lot[wafer][25]) {';
echo preg_replace($regex, "['\$3']", $str);
Output:
$output = "<input name='variable[key]' has to be preserved as it is.";
$output = 'Even this string variable[key] has to be preserved as it is.';
$variable['key'] = "WhatElse";
$result = $wso->RSLA("7050", $vegalot, "600", "WFID_OK_WEB","1300", $_POST['username']);
if ($result['ECD'] != 0) {
if ($line=="AAAA" && in_array(substr($wso->lot,0,7),$lot_aaaa_list) && $lot['wafer'][25]) {