I'm a newbie. I'm trying to find the full name in either one of the lines below and without the Obituary for
<h2>Obituary for John Doe</h2>
<h1>James Michael Lee</h1>
My regex is this.
(<h1>(. ?)<\/h1>|<h2>Obituary\sfor\s(. ?)<\/h2>)
What I'm getting is still Obituary for John Doe
. How to remove the Obituary for
?
CodePudding user response:
Many roads lead to Rome, you can probably do something like this:
<h(?:1>|2>Obituary\sfor\s)\K[^><]
See this demo at regex101. The matches will be in $out[0]
.
\K
resets beginning of the reported match. See the SO Regex FAQ for more.
CodePudding user response:
Could you do something like this without using regex?
/**
* @description : Function extracts names from html header tags
* @example : "<h2>Obituary for John Doe</h2><h1>James Michael Lee</h1>" -> ["John Doe", "James Michael Lee"]
* @param $html string
* @return []string : list of full names
*/
function extractFullNames($html) {
$regex = '/<h[1-2]>(.*?)<\/h[1-2]>/';
preg_match_all($regex, $html, $matches);
$names = $matches[1];
$names = array_map('trim', $names);
$names = array_map('strip_tags', $names);
$names = array_map('strtolower', $names);
$names = array_map('ucwords', $names);
$names = array_map('removeObituary', $names);
return $names;
}
/**
* @description : Function used to remove "Obituary For" if present
* @example : "Obituary For John Doe" -> "John Doe"
* @param $name string
* @return string : name without "Obituary For"
*/
function removeObituary($name) {
$name = str_replace("Obituary For ", "", $name);
return $name;
}
// Test cases
$html = '<h2>Obituary for John Doe</h2><h1>James Michael Lee</h1>';
$names = extractFullNames($html);
$expected = ['John Doe', 'James Michael Lee'];
echo "Expected: " . implode(', ', $expected) . "\n";
echo "Actual: " . implode(', ', $names);
CodePudding user response:
i'd probably do something like
/^(?:\s<[^>]*?>)?(?:.*\s for\s )?([^<]*)/
and extract $1
(the first match group).