Home > database >  Reading in a Table Using rvest
Reading in a Table Using rvest

Time:07-12

This is a link to a table with a table of ~290 Vine Plant names:

https://www.forestryimages.org/browse/catsubject.cfm?cat=51

I am trying to read in the table and keep the Common Names column. I have tried doing this with with the rvest library like so:

vine_web <- "https://www.forestryimages.org/browse/catsubject.cfm?cat=51"
vine_names <- vine_web %>%
  read_html() %>%
  html_table()

It reads the column names, but not the contents of the table. I have tried several reiterations using html_nodes, html_element, copying the css selector, and even the xpath.

I always end up with this as a result:

[[1]]
# A tibble: 1 x 4
  `Subject Number` `Common Name` `Scientific Name` `Number Of Images`
  <lgl>            <lgl>         <lgl>             <lgl>             
1 NA               NA            NA                NA                

The table is in a dynamic format, which leads me to believe that html_table() may need to be altered or may be the inappropriate function to use here. I would like to know if there is a way to read this table into R.

CodePudding user response:

You need javascript to enable that table it appears, but there is a workaround to download the data in json form. If you inspect and go to the network tab, there is a json link. Let me know if this answers your question.

library(jsonlite)
json_data <- jsonlite::fromJSON("https://api.bugwood.org/rest/api/subject/.json?fmt=datatable&include=count&cat=51&systemid=2&draw=2&columns[0][data]=0&columns[0][searchable]=false&columns[0][orderable]=false&columns[0][search][value]=&columns[1][data]=1&columns[1][searchable]=true&columns[1][orderable]=true&columns[1][search][value]=&columns[2][data]=2&columns[2][searchable]=true&columns[2][orderable]=true&columns[2][search][value]=&columns[3][data]=3&columns[3][searchable]=false&columns[3][orderable]=true&columns[3][search][value]=&order[0][column]=1&order[0][dir]=asc&start=163&length=126&search[value]=&_=1657572710039")
as.data.frame(json_data$data)
  • Related