Home > Back-end >  PHP :: Parse strings, while iterating through an array of substrings?
PHP :: Parse strings, while iterating through an array of substrings?

Time:04-11

I'm a Java developer who is struggling to write his first PHP script. FYI, I'm coding with PHP 8.1.2 on an Ubuntu machine.

My code has to open a log file, read the lines one-by-one, then extract a key substring based on the preamble of the string. For example, if the log file is:

April 01 2020 Key Information Read :: Interesting Character #1:  Kermit the Frog
April 01 2020 Key Information Read :: Interesting Character #2:  Miss Piggy
April 01 2020 Key Information Read :: Their Best Movie:  The Muppet Movie (1979)
...many more lines...

Then I need a script that reads each line and extracts:

Kermit the Frog
Miss Piggy
The Muppet Movie (1979)
...many more items...

I won't know those above values before the file is read.

This problem is very solvable. Here's my PHP code, where $str is one line of the input file:

    function parseThisStr($str){
            if( str_contains($str, "Interesting Character #1:  ") ){
                    $mySubstr = "Interesting Character #1:  ";
                    $tmpIndex = strpos( $str, $mySubstr );
                    $tmpIndex  = strlen($mySubstr);
                    $str2 = substr( $str, $tmpIndex );
                    $str2 = preg_replace('~[\r\n] ~', '', $str2);   // remove newline
                    return $str2;
            }
            else if( str_contains($str, "Interesting Character #2:  ") ){
                    $mySubstr = "Interesting Character #2:  ";
                    ...copy code from above...
                    return $str2;
            else if( str_contains($str, "Their Best Movie:  ") ){
                    $mySubstr = "Their Best Movie:  ";
                    ...copy code from above...
                    return $str2;
            return $str;
    }

This will work... but its needlessly repetitive, right? For each substring I am checking, I need to copy five identical lines of code. There are about 30 substrings I need to search for; this will make my code about 150 lines longer than it needs to be.

There's got to be a way to do this with more intelligence, right? Can't I store every to-be-searched substring in an array, maybe like this:

$array = array(
    1    => "Interesting Character #1:  ",
    2    => "Interesting Character #2:  ",
    3    => "Their Best Movie:  ",
    ...etc...
);

...and then iterate through the array, maybe like this:

    function parseThisStr($str){
            $array = array(
                  1    => "Kermit the Frog",
                  ...etc...
            };
            foreach( $array as &$value ){
                if( str_contains($str, $value) ){
                        $tmpIndex = strpos( $str, $value );
                        $tmpIndex  = strlen($value);
                        $str2 = substr( $str, $tmpIndex );
                        $str2 = preg_replace('~[\r\n] ~', '', $str2);   // remove newline
                        return $str2;
                }
            return null;
            }

Conceptually, this should work... but I can't figure out the correct syntax. PHP syntax is confusing to me, sadly. Does anyone see where I'm going wrong? Thank you.

EDIT: I screwed up the values of $array in my first posting. $array should have the substrings that I'll use to search the larger string.

CodePudding user response:

Using regex'es will produce more clear code. For example with preg_match:

$line = 'April 01 2020 Key Information Read :: Interesting Character #1:  Kermit the Frog';
$searchTerms = ["Kermit the Frog","Miss Piggy","The Muppet Movie (1979)"];

// prepare regex with named group from terms
$delimiter = '~';
$regex = $delimiter . '(?<phrase>(' . join('|', array_map(fn($term) => preg_quote($term, $delimiter), $searchTerms)) . '))' . $delimite;

// search by regex
preg_match($regex, $line, $matches);
$foundPhrase = $matches['phrase'] ?? null;

CodePudding user response:

You might use a regex with a more specific pattern:

\b(?:Interesting Character #\d :|Their Best Movie:)\h \K. 

The pattern matches:

  • \b A word boundary to prevent a partial word match
  • (?:Interesting Character #\d :|Their Best Movie:)
  • \h Match 1 horizontal whitespace characters
  • \K Forget what is matched so far
  • . Match 1 or more characters

See a regex demo and a PHP demo

$re = '/\b(?:Interesting Character #\d :|Their Best Movie:)\h \K. /';
$str = 'April 01 2020 Key Information Read :: Interesting Character #1:  Kermit the Frog
April 01 2020 Key Information Read :: Interesting Character #2:  Miss Piggy
April 01 2020 Key Information Read :: Their Best Movie:  The Muppet Movie (1979)
';

preg_match_all($re, $str, $matches);
print_r($matches[0]);

Output

Array
(
    [0] => Kermit the Frog
    [1] => Miss Piggy
    [2] => The Muppet Movie (1979)
)

Another pattern with a bit broader match taking the leading :: into account and match until the first occurrence of :

::\h [^:\r\n] :\h \K. 

See another regex demo

  • Related