Home > database >  How to parse xml lists and tables in R for BGG API
How to parse xml lists and tables in R for BGG API

Time:09-26

I want to write an R Shiny app for my board game collection, and need to get data from the Board Game Geek API to do this. It's an xml based API, and it's left to the user to figure everything out, apparently. Anyway, I am not a web programmer and am having some difficulty with certain aspects of it. Note that none of my code may the best way to do this. An example web page is: https://boardgamegeek.com/xmlapi/boardgame/354242. It is a long page, and I don't want to copy it all of it over, so please look at it if I don't copy enough of it over here.

<boardgameintegration objectid="353880" inbound="true">Moly Atrapa</boardgameintegration>
<poll name="suggested_numplayers" title="User Suggested Number of Players" totalvotes="0">
<results numplayers="1">
  <result value="Best" numvotes="0"/>
  <result value="Recommended" numvotes="0"/>
  <result value="Not Recommended" numvotes="0"/>
</results>
<results numplayers="2">
  <result value="Best" numvotes="0"/>
  <result value="Recommended" numvotes="0"/>
  <result value="Not Recommended" numvotes="0"/>
</results>
<results numplayers="3">
  <result value="Best" numvotes="0"/>
  <result value="Recommended" numvotes="0"/>
  <result value="Not Recommended" numvotes="0"/>
</results>

</poll>
  <poll name="language_dependence" title="Language Dependence" totalvotes="0">
    <results>
      <result level="1" value="No necessary in-game text" numvotes="0"/>
      <result level="2" value="Some necessary text - easily memorized or small crib sheet" numvotes="0"/>
      <result level="3" value="Moderate in-game text - needs crib sheet or paste ups" numvotes="0"/>
      <result level="4" value="Extensive use of text - massive conversion needed to be playable" numvotes="0"/>
      <result level="5" value="Unplayable in another language" numvotes="0"/>
    </results>
 </poll>

My main questions are 1) how to extract the "name", "title", and "total votes" properties (is that what they are called - I can't find this stuff!), and 2) how to do the same for the different results for the different "numplayers"? My code so far looks like this, but it's only gotten me as far as extracting the information for things like publication date, etc.

library(XML)
library(methods)
library(xml2)
library(rvest)

data <- read_xml("https://boardgamegeek.com/xmlapi/boardgame/35424")
xmlfile <- xmlParse(data)
xmltop = xmlRoot(xmlfile)
xmltop[['boardgame']][['yearpublished']][1]$text # gets the year published

CodePudding user response:

To retrieve those attributes, consider XML's undocumented xmlAttrsToDataFrame, accessible with triple colon operator:

library(XML) 

url <- "https://boardgamegeek.com/xmlapi/boardgame/354242"

doc <- xmlParse(readLines(url))

poll_df <- XML:::xmlAttrsToDataFrame(getNodeSet(doc, '//poll'))
poll_df
#                   name                            title totalvotes
# 1 suggested_numplayers User Suggested Number of Players          0
# 2  language_dependence              Language Dependence          0
# 3  suggested_playerage        User Suggested Player Age          0

results_dfs <- lapply(
  getNodeSet(doc, '//poll[@name="suggested_numplayers"]/results'),
  function(x) data.frame(
    numplayers = xmlAttrs(x)["numplayers"],
    XML:::xmlAttrsToDataFrame(xmlChildren(x)),
    row.names = NULL
  )
)

result_df <- do.call(rbind, results_dfs)
result_df
#    numplayers           value numvotes
# 1           1            Best        0
# 2           1     Recommended        0
# 3           1 Not Recommended        0
# 4           2            Best        0
# 5           2     Recommended        0
# 6           2 Not Recommended        0
# 7           3            Best        0
# 8           3     Recommended        0
# 9           3 Not Recommended        0
# 10          4            Best        0
# 11          4     Recommended        0
# 12          4 Not Recommended        0
# 13          5            Best        0
# 14          5     Recommended        0
# 15          5 Not Recommended        0
# 16          6            Best        0
# 17          6     Recommended        0
# 18          6 Not Recommended        0
# 19         6             Best        0
# 20         6      Recommended        0
# 21         6  Not Recommended        0

CodePudding user response:

I think you need xmlAttrs (so the term to search on is "attributes")

# used a different entry:
data <- read_xml("https://boardgamegeek.com/xmlapi/boardgame/78985")
xmlfile <- xmlParse(data)
xmltop = xmlRoot(xmlfile)

xmlAttrs(xmltop[['boardgame']][['poll']])
                              name 
            "suggested_numplayers" 
                             title 
"User Suggested Number of Players" 
                        totalvotes 
                               "0" 

And you can extract from that vector of attributes:

xmlAttrs(xmltop[['boardgame']][['poll']])['name']
                  name 
"suggested_numplayers" 

If you assign the output to an R data object name, you get a named character vector:

> attrs <- xmlAttrs(xmltop[['boardgame']][['poll']])
> str(attrs)
 Named chr [1:3] "suggested_numplayers" ...
 - attr(*, "names")= chr [1:3] "name" "title" "totalvotes"

You might find this presentation helpful:

https://www.stat.berkeley.edu/~statcur/Workshop2/Presentations/XML.pdf

  • Related