Extracting data using hash sign in preg_match_all() pattern does not work-CodePudding

I am new to RegEx. I am parsing a HTML page and because it is buggy I cannot use a XML or HTML parser. So I am using a regular expression. My code looks like this:

$html = '<html><div data-id="ABC012" data-index="123" ...';
preg_match_all('/<div data-id="[A-Z\\d] " data-index="\\d "/', $html, $result);
var_dump($result);

The output looks good so the code is working. Now I want to extract the matched values. I did it exactly as described in this answer and now the code looks like this:

$html = '<html><div data-id="ABC012" data-index="123" ...';
preg_match_all('/<div data-id="#([A-Z\\d] )" data-index="#(\\d )"/', $html, $result);
var_dump($result);

But it outputs an empty array. What is wrong? Please don't improve the pattern by adding the closing '>' or making it robust against white spaces. I just need to get the code running.

CodePudding user response：

You could write the code and the pattern like this, using a single backslash to match digits \d and omit the # in the pattern as that is not in the example data:

$html = '<html><div data-id="ABC012" data-index="123" ...';
preg_match_all('/<div data-id="([A-Z\d] )" data-index="(\d )"/', $html, $result);
var_dump($result);

Output

array(3) {
  [0]=>
  array(1) {
    [0]=>
    string(38) "<div data-id="ABC012" data-index="123""
  }
  [1]=>
  array(1) {
    [0]=>
    string(6) "ABC012"
  }
  [2]=>
  array(1) {
    [0]=>
    string(3) "123"
  }
}