I'm a Java developer who is struggling to write his first PHP script. FYI, I'm coding with PHP 8.1.2 on an Ubuntu machine.
My code has to open a log file, read the lines one-by-one, then extract a key substring based on the preamble of the string. For example, if the log file is:
April 01 2020 Key Information Read :: Interesting Character #1: Kermit the Frog
April 01 2020 Key Information Read :: Interesting Character #2: Miss Piggy
April 01 2020 Key Information Read :: Their Best Movie: The Muppet Movie (1979)
...many more lines...
Then I need a script that reads each line and extracts:
Kermit the Frog
Miss Piggy
The Muppet Movie (1979)
...many more items...
I won't know those above values before the file is read.
This problem is very solvable. Here's my PHP code, where $str is one line of the input file:
function parseThisStr($str){
if( str_contains($str, "Interesting Character #1: ") ){
$mySubstr = "Interesting Character #1: ";
$tmpIndex = strpos( $str, $mySubstr );
$tmpIndex = strlen($mySubstr);
$str2 = substr( $str, $tmpIndex );
$str2 = preg_replace('~[\r\n] ~', '', $str2); // remove newline
return $str2;
}
else if( str_contains($str, "Interesting Character #2: ") ){
$mySubstr = "Interesting Character #2: ";
...copy code from above...
return $str2;
else if( str_contains($str, "Their Best Movie: ") ){
$mySubstr = "Their Best Movie: ";
...copy code from above...
return $str2;
return $str;
}
This will work... but its needlessly repetitive, right? For each substring I am checking, I need to copy five identical lines of code. There are about 30 substrings I need to search for; this will make my code about 150 lines longer than it needs to be.
There's got to be a way to do this with more intelligence, right? Can't I store every to-be-searched substring in an array, maybe like this:
$array = array(
1 => "Interesting Character #1: ",
2 => "Interesting Character #2: ",
3 => "Their Best Movie: ",
...etc...
);
...and then iterate through the array, maybe like this:
function parseThisStr($str){
$array = array(
1 => "Kermit the Frog",
...etc...
};
foreach( $array as &$value ){
if( str_contains($str, $value) ){
$tmpIndex = strpos( $str, $value );
$tmpIndex = strlen($value);
$str2 = substr( $str, $tmpIndex );
$str2 = preg_replace('~[\r\n] ~', '', $str2); // remove newline
return $str2;
}
return null;
}
Conceptually, this should work... but I can't figure out the correct syntax. PHP syntax is confusing to me, sadly. Does anyone see where I'm going wrong? Thank you.
EDIT: I screwed up the values of $array
in my first posting. $array
should have the substrings that I'll use to search the larger string.
CodePudding user response:
Using regex'es will produce more clear code. For example with preg_match
:
$line = 'April 01 2020 Key Information Read :: Interesting Character #1: Kermit the Frog';
$searchTerms = ["Kermit the Frog","Miss Piggy","The Muppet Movie (1979)"];
// prepare regex with named group from terms
$delimiter = '~';
$regex = $delimiter . '(?<phrase>(' . join('|', array_map(fn($term) => preg_quote($term, $delimiter), $searchTerms)) . '))' . $delimite;
// search by regex
preg_match($regex, $line, $matches);
$foundPhrase = $matches['phrase'] ?? null;
CodePudding user response:
You might use a regex with a more specific pattern:
\b(?:Interesting Character #\d :|Their Best Movie:)\h \K.
The pattern matches:
\b
A word boundary to prevent a partial word match(?:Interesting Character #\d :|Their Best Movie:)
\h
Match 1 horizontal whitespace characters\K
Forget what is matched so far.
Match 1 or more characters
See a regex demo and a PHP demo
$re = '/\b(?:Interesting Character #\d :|Their Best Movie:)\h \K. /';
$str = 'April 01 2020 Key Information Read :: Interesting Character #1: Kermit the Frog
April 01 2020 Key Information Read :: Interesting Character #2: Miss Piggy
April 01 2020 Key Information Read :: Their Best Movie: The Muppet Movie (1979)
';
preg_match_all($re, $str, $matches);
print_r($matches[0]);
Output
Array
(
[0] => Kermit the Frog
[1] => Miss Piggy
[2] => The Muppet Movie (1979)
)
Another pattern with a bit broader match taking the leading ::
into account and match until the first occurrence of :
::\h [^:\r\n] :\h \K.
See another regex demo