Home > Software design >  Get table from html with htmltab
Get table from html with htmltab

Time:11-03

I am trying to get a table from a website into R. The code that I am currently running is:

library(htmltab)
url1 <- 'https://covid19-dashboard.ages.at/dashboard_Hosp.html'
TAB<-htmltab(url1, which = "//table[@id = 'tblIcuTimeline']")

This is selecting the correct table because the variables are the ones I want but the table is empty. It might be a problem with my XPath. The error that I am getting is:

No encoding supplied: defaulting to UTF-8. Error in Node[[1]] : subscript out of bounds

CodePudding user response:

This website has a convenient JSON file available, which you can extract like so:

library(jsonlite)
url <- "https://covid19-dashboard.ages.at/data/JsonData.json"
ll <- jsonlite::fromJSON(txt = url)

From there you can subset and extract what you want. My guess is you are after the ll$CovidFallzahlen My German is not so good, so couln't isolate the exact values you are after.

CodePudding user response:

The problem is (probaby) that on direct approach of the page, the table is empy and has to be filled on pageload. But on initial approach of the page (using your code), the table is still empty.

Below is a RSelenium approach that results in a list all.table with all filled tables. Pick the one you need.
requirement: firefox is installed

library(RSelenium)
library(rvest)
library(xml2)

#setup driver, client and server
driver <- rsDriver( browser = "firefox", port = 4545L, verbose = FALSE ) 
server <- driver$server
browser <- driver$client

#goto url in browser
browser$navigate("https://covid19-dashboard.ages.at/dashboard_Hosp.html")

#get all tables
doc <- xml2::read_html(browser$getPageSource()[[1]])
all.table <- rvest::html_table(doc)

#close everything down properly
browser$close()
server$stop()
# needed, else the port 4545 stays occupied by the java process
system("taskkill /im java.exe /f", intern = FALSE, ignore.stdout = FALSE)
  • Related