I want to extract date from several HTML documents. The date always follow this pattern:
- Starting three alphabets of month with first character in uppercase i-e Jan.
- Two digit numerical characters of day of the month i-e 09
- A comma as a separater
- Four digit numerical characters of year i-e 2022.
Sample of complete date is Jan 09, 2022
I want to extract only those dates which are wraped in span tags. So, the complete pattern is
<span>Jan 09, 2022</span>
I am not good at writing preg_match. Can anyone please help me?
CodePudding user response:
<span>(\w{3} \d{1,2}, \w{4})<\/span>
\w
is a meta-character for the set [a-zA-Z0-9_]
.
{3}
means thrice.
\d
is a meta-character for the set [0-9]
.
{1,2}
means once or twice.
Try it https://regex101.com/r/tNRa73/1
$pattern = '/<span>(\w{3} \d{1,2}, \w{4})<\/span>/';
preg_match(
$pattern,
$html,
$matches // <-- The results will be added to this new variable.
);
$matches[1]; // The date will be in the first index because it was
// the first "capture group" i.e set of parens.
// If you expect multiple dates in one HTML document, then use:
preg_match_all(
$pattern,
$html,
$matches
);
$matches[1]; // Now, the first index is an array of matches of
// the first "capture group".