Home > Enterprise >  php 'preg_match_all' and 'str_replace': regular expression to replace constants
php 'preg_match_all' and 'str_replace': regular expression to replace constants

Time:12-22

I need to implement a preg_replace to fix some warnings that I have on an huge amount of scripts.

My goal is to replace statements like...

$variable[key] = "WhatElse";
$result = $wso->RSLA("7050", $vegalot, "600", "WFID_OK_WEB","1300", $_POST[username]);
if ($result[ECD] != 0) {
if ($line=="AAAA" && in_array(substr($wso->lot,0,7),$lot_aaaa_list) && $lot[wafer][25]) {

... with same statements having CONSTANTS replaced by ARRAY KEYS ...

$variable['key'] = "WhatElse";
$result = $wso->RSLA("7050", $vegalot, "600", "WFID_OK_WEB","1300", $_POST['username']);
if ($result['ECD'] != 0) {
if ($line=="AAAA" && in_array(substr($wso->lot,0,7),$lot_aaaa_list) && $lot[wafer][25]) {

but excluding cases when the array variable is declared within a string, ie...

$output = "<input name='variable[key]' has to be preserved as it is.";
$output = 'Even this string variable[key] has to be preserved as it is.';

...because they would be replaced (but this not not what I want) into:

$output = "<input name='variable['key']' has to be preserved as it is.";
$output = 'Even this string variable['key'] has to be preserved as it is.';

Every statements is identified by a ''preg_match_all'' statement and then replaced with a ''str_replace'':

preg_match_all('/(\[(\w*)\])/', $str, $matches, PREG_SET_ORDER, 0);
$replace_str = $str;
$local_changeflag = false;
foreach($matches as $m) {
    if (!$m[2]) continue;
    if (is_numeric($m[2])) continue;
    $replace_str = str_replace($m[1], "['" . $m[2] . "']", $replace_str);
    $local_changeflag = true;
}

Do you have any suggestion to better solve such issue that I have?

CodePudding user response:

[I know this isn't regexp, but since you asked for 'suggestion to better solve such issue' I give you my 2 cents]

How about simply parsing the code ;):

$source = file_get_contents('/tmp/test.php'); // Change this
$tokens = token_get_all($source);

$history = [];
foreach ($tokens as $token) {
    if (is_string($token)) { // simple 1-character token       
        array_push($history, str_replace(["\r", "\n"], '', $token));
        $history = array_slice($history, -2);

        echo $token;
    } else {
        list($id, $text) = $token;

        switch ($id) {
            case T_STRING:
                if ($history == [T_VARIABLE, '[']) {
                    // Token sequence is [T_VARIABLE, '[', T_STRING]
                    echo "'$text'";
                }
                else {
                    echo $text;
                }
                break;

            default:
                // anything else -> output "as is"
                echo $text;
                break;
        }

        array_push($history, $id);
        $history = array_slice($history, -2);
    }
}

Of course, the $source needs to be changed to whatever suits you. token_get_all() then loads the PHP code and parses it into a list of tokens. That list is then processed item by item and possibly changed before being output again, according to our needs.

1-char tokens like [ ("[" and "]" in f.ex $myVariable[1] both get to be tokens) are a special case which has to be handled in the loop. Otherwise $token is an array with an ID for the type of token and the token itself.

"Unfortunately" T_STRING is kind of a general case, so to pinpoint only the strings being used as constants in array indexing we store the 2 items preceding the current in $history. ("$myVariable" and "[")

..and..that's it, really. The code is read from a file, processed and output to stdout. Everything but the "constants as array index" case should be output as is.

If you like I can rewrite it as a function or something. The above should be kind of the general solution, though.

CodePudding user response:

If you want to wrap any valid identifiers inside square brackets, you can use preg_replace directly:

$regex = '/(["\'])(?:(?=(\\\\?))\2.)*?\1(*SKIP)(*F)|\[([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)]/s';
$ouptut = preg_replace($regex, '$3', $text);

See the regex demo. Details:

  • (["'])(?:(?=(\\?))\2.)*?\1 - matches a string between single or double quotation marks (contains two capturing groups)
  • (*SKIP)(*F) - discards the matched text and fails the match starting a new search from the failure location
  • | - or
  • \[ - [ char
  • ([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*) - Group 3: a letter, underscore, or any char from the \x7f-\xff range and then any alphanumeric, underscore or any char from the \x7f-\xff range
  • ] - a ] char.

See the PHP demo:

$regex = '/(["\'])(?:(?=(\\\\?))\2.)*?\1(*SKIP)(*F)|\[([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*)]/s';
$str = '$output = "<input name=\'variable[key]\' has to be preserved as it is.";
$output = \'Even this string variable[key] has to be preserved as it is.\';

$variable[key] = "WhatElse";
$result = $wso->RSLA("7050", $vegalot, "600", "WFID_OK_WEB","1300", $_POST[username]);
if ($result[ECD] != 0) {
if ($line=="AAAA" && in_array(substr($wso->lot,0,7),$lot_aaaa_list) && $lot[wafer][25]) {';
echo preg_replace($regex, "['\$3']", $str);

Output:

$output = "<input name='variable[key]' has to be preserved as it is.";
$output = 'Even this string variable[key] has to be preserved as it is.';

$variable['key'] = "WhatElse";
$result = $wso->RSLA("7050", $vegalot, "600", "WFID_OK_WEB","1300", $_POST['username']);
if ($result['ECD'] != 0) {
if ($line=="AAAA" && in_array(substr($wso->lot,0,7),$lot_aaaa_list) && $lot['wafer'][25]) {
  • Related