I need to extract certain attributes from an xml file that has the same name of a node, but different number of attributes per node. The file is located here:
And here is a small portion of the file itself:
<boardgames termsofuse="https://boardgamegeek.com/xmlapi/termsofuse">
<boardgame objectid="13">
<yearpublished>1995</yearpublished>
<minplayers>3</minplayers>
<maxplayers>4</maxplayers>
<playingtime>120</playingtime>
<minplaytime>60</minplaytime>
<maxplaytime>120</maxplaytime>
<age>10</age>
<name sortindex="1">Catan</name>
<name primary="true" sortindex="1">CATAN</name>
<name sortindex="1">Catan (Колонизаторы)</name>
<name sortindex="1">Catan telepesei</name>
<name sortindex="1">Catan: Das Spiel</name>
<name sortindex="1">Catan: Die Bordspel</name>
<name sortindex="1">Catan: El Juego</name>
<name sortindex="1">Catan: Gra planszowa</name>
<name sortindex="1">Catan: Il Gioco</name>
<name sortindex="1">Catan: Landnemarnir</name>
I want to extract only the value for "sortindex" from each line with "name" as the node name. I have tried the following, but it returns both the primary "true" and the sort index value for the second "name" node. I've tried so many different ways, and I can't get it to work. I've tried xmlGetAttr and others. How do I get this simple operation to work?
data <- read_xml(url)
xmlfile <- xmlParse(data)
xmltop = xmlRoot(xmlfile)
xmlSApply(getNodeSet(xmltop, '//name[@sortindex]'), xmlAttrs)
> xmlSApply(getNodeSet(xmltop, '//name[@primary]'), xmlAttrs)
[,1]
primary "true"
sortindex "1"
CodePudding user response:
It sounds like you want to include any name node, even if it doesn't have the attribute. If so, you can try the following:
data <- read_xml('https://boardgamegeek.com//xmlapi//boardgame//13&type=boardgame,boardgameexpansion,boardgameaccesory,rpgitem,rpgissue,videogame&versions=1&stats=1&videos=1&marketplace=1&comments=1&pricehistory=1')
xmlfile <- xmlParse(data)
xmltop <- xmlRoot(xmlfile)
getAttr <- function(x, attrName) {
attrs <- xmlAttrs(x)
if (attrName %in% names(attrs)) {
attrs[[attrName]]
} else {
NA
}
}
xmlSApply(getNodeSet(xmltop, '//name'), function(x)getAttr(x, "sortindex"))
xmlSApply(getNodeSet(xmltop, '//name'), function(x)getAttr(x, "primary"))
If you don't want to include nodes without the attribute, then you can do something very similar:
library(xml2)
library(XML)
data <- read_xml('https://boardgamegeek.com//xmlapi//boardgame//13&type=boardgame,boardgameexpansion,boardgameaccesory,rpgitem,rpgissue,videogame&versions=1&stats=1&videos=1&marketplace=1&comments=1&pricehistory=1')
xmlfile <- xmlParse(data)
xmltop <- xmlRoot(xmlfile)
xmlSApply(getNodeSet(xmltop, '//name[@sortindex]'), function(x)xmlAttrs(x)[['sortindex']])
xmlSApply(getNodeSet(xmltop, '//name[@primary]'), function(x)xmlAttrs(x)[['primary']])