Home > Net >  How to parse XML correctly by specifying the parent of the subcategories?
How to parse XML correctly by specifying the parent of the subcategories?

Time:02-10

Faced with such a problem. It is necessary to pair categories with subcategories genres from liters, I want to do this through a simple function without bicycles. At this link, the generated XML with genres, which needs to be parsed by substituting the id attribute as the parent to the child sections.

There is such an XML structure:

<genres>
        <genre id="5003" title="бизнес-книги" type="root">
                <genre id="5049" title="банковское дело" token="bankovskoe_delo" type="genre"/>
                <genre id="210646" title="бизнес-справочники" token="business-spravochniki" type="genre"/>
                <genre id="5051" title="бухучет / налогообложение / аудит" token="buhuchet_nalogooblozhenie_audit" type="genre"/>
                <genre id="6784" title="государственное и муниципальное управление" token="gosudarstvennoe_i_munitsipalnoe_upravlenie" type="genre"/>
                <genre id="5060" title="делопроизводство" token="deloproizvodstvo" type="genre"/>
                <genre id="5061" title="зарубежная деловая литература" token="zarubezhnaya_delovaya_literatura" type="genre"/>
                <genre id="5062" title="интернет-бизнес" token="internet" type="genre"/>
                 <genre id="5047" title="кадровый менеджмент" token="kadrovyj_menedzhment" type="container">
                          <genre id="5334" title="аттестация персонала" token="attestaciya_personala" type="genre"/>
                          <genre id="5330" title="гендерные различия" token="gendernyye_razlichiya" type="genre"/>
                          <genre id="5332" title="конфликты" token="konflikty" type="genre"/>
                          <genre id="5336" title="коучинг" token="kouching" type="genre"/>
                          <genre id="5333" title="мотивация" token="motivaciya" type="genre"/>
                          <genre id="5335" title="поиск и подбор персонала" token="poisk_presonala_hr" type="genre"/>
                          <genre id="5331" title="тимбилдинг" token="timbilding" type="genre"/>
                          <genre id="6583" title="управление персоналом" token="upravlenie_personalom" type="genre"/>
                 </genre>
...
</genres>

PHP Code:

$url = 'https://partnersdnld.litres.ru/genres_list_2/';

$dom = new DOMDocument('1.0', 'utf-8');

$dom->load($url);
$xpath = new DOMXpath($dom);

foreach ($xpath->evaluate('//genre') as $node) {
    var_dump(
        [
            'parent_id' => $xpath->evaluate('string(ancestor::genre[1]/id)', $node),
            'id' => $xpath->evaluate('string(id)', $node),
            'title' => $xpath->evaluate('string(title)', $node),
        ]
    );
}

And he got confused in the sections and attributes. Can someone tell me why it outputs empty results and how to parse parent_id and the rest of the data correctly?

CodePudding user response:

The attributes are on a different Xpath axis. id is short for child::id and will fetch the element node on the child axis. For the attribute axis use attribute::id or the shortcut @id.

Additionally to fetch the attribute from the context node I would suggest using the DOM method. No need for Xpath in this case:

foreach ($xpath->evaluate('//genre') as $node) {
    var_dump(
        [
            'parent_id' => $xpath->evaluate('string(ancestor::genre[1]/@id)', $node),
            'id' => $node->getAttribute('id'),
            'title' => $node->getAttribute('title'),
        ]
    );
}
  • Related